6.2 Create Catalog for Clean Zone
Create Glue Data Catalog for Clean Zone
- Access the Amazon Management Console
- Search for the Glue service
- Select Glue from the search results
- Create a Database for the Glue Data Catalog
- In the Glue Data Catalog section, select Databases then choose Add database
- Enter the database name as
fashion-clean-zone
- Click Create
- Create a table for the Glue Data Catalog
- Click on the
fashion-clean-zone
database you just created - Select Add table then choose Add tables using a crawler
- Click on the
- In the crawler properties section
- Enter the crawler name as
fashion-clean-zone-crawler
- Click Next
- Enter the crawler name as
- Select the data source and click Add a data source
- Choose S3 then click Browse
- Select the
fashion-clean-zone
bucket then click Add - Click Next
- In the IAM role section
- Select Create an IAM role
- Enter the role name as
AWSGlueServiceRole-FashionCrawlerRole
- Click Next
- In the Set output and scheduling section
- Target database: Select the
fashion-clean-zone
database - Frequency: choose Daily and enter 17:00 (UTC is 00:00 Vietnam time)
- Click Next
- Target database: Select the
- In the Review section
- Click Finish
- Select Run Crawler and wait for the crawler process to complete. This process will take about 1 minute.
Check Results with Athena
Access the Athena service from the AWS Management Console.
Click Launch Query Editor
- In the Query Editor interface, click Settings and then Manage
- Under Location of query result, click Browse S3 and select the
fashion-logic-zone
bucket - Click Save
- Under Location of query result, click Browse S3 and select the
- Return to the Editor section
- On the left, select Data Source as AwsDataCatalog
- Select the Database as
fashion-clean-zone
- You can now enter SQL queries to query the data. For example:
SELECT * FROM "fashion-clean-zone"."clickstreams" limit 10;