3. Create Application Environment

Overview

  • In this section, we will create the application environment for the Data Pipeline.
  • The environment includes:
    • EC2 Instance: Where mock data is generated and database tables are initialized.
    • RDS: PostgreSQL database for storing mock data.

The database model consists of 4 tables:

  • products, users, orders, and order_details

alt

Deployment Steps

  1. Return to the RDS interface
    • Select Databases from the left menu
    • Click on fashion-db

alt

  1. In the fashion-db interface, save the Endpoint of the database somewhere safe.

alt

  1. Scroll down to the Connected compute resources section
    • Click on Actions
    • Select Set up EC2 connection

alt

  1. In the Set up EC2 connection interface
    • For EC2 Instance, select fashion-webapp
    • Click continue
    • In the Review and confirm section, select Set up

alt

  1. When finished, you will see a notification like this:

alt

  1. Return to the terminal connected to the EC2 Instance from step 2.3 Create EC2 Instance.

  2. Test the connection to the PostgreSQL database using the Database DNS you saved earlier:

    • Use the following command to connect to the PostgreSQL database, then enter the database password you created.
psql -U postgres -h <YOUR_POSTGRESQL_DNS> -p 5432 -d postgres
  1. If you see a notification like the following, you have successfully connected to the PostgreSQL database.

    • If the image below does not appear, please review the above steps. alt
  2. Type \q to exit PostgreSQL and return to the terminal.

  3. Set up the environment for generating mock data:

    • Enter the following command: vim .env
    • In vim, press i to enter insert mode.
    • Enter the following information into the .env file:
      • RDS_HOST: The DNS of the PostgreSQL database you saved in step 2.
      • RDS_PASSWORD: The password of the PostgreSQL database you created in step 2.
      • KINESIS_STREAM_NAME: fashion-ds
      • STREAM_ARN: The ARN of the Kinesis Stream you created in step 2.6 Create Kinesis Stream.
    • Press Esc to exit insert mode.
    • Type :wq to save the .env file and exit vim.

alt

  1. Create tables for the database:
    • In this step, we will create 4 tables for the database on RDS
    • At the same time, mock data for 1000 users and the products table will be generated
python initdb.py
  1. If you see a notification like the following, you have successfully created the database and mock data for 1000 users and the products table. alt

  2. View the tables created in the PostgreSQL database:

    • Use the command from step 7 to connect to the PostgreSQL database.
    • Enter the following command to view the tables created in the PostgreSQL database:
    SELECT * FROM information_schema.tables WHERE table_schema = 'public';

alt

  1. View all data in a table (Optional):
    SELECT * FROM <TABLE_NAME>;
--- Ví dụ:
    SELECT * FROM users;
  1. You have successfully created the application environment for the Data Pipeline.