6.2 Create Catalog for Clean Zone

Create Glue Data Catalog for Clean Zone

  1. Access the Amazon Management Console
    • Search for the Glue service
    • Select Glue from the search results

alt

  1. Create a Database for the Glue Data Catalog
    • In the Glue Data Catalog section, select Databases then choose Add database
    • Enter the database name as fashion-clean-zone
    • Click Create

alt

alt

  1. Create a table for the Glue Data Catalog
    • Click on the fashion-clean-zone database you just created
    • Select Add table then choose Add tables using a crawler

alt

  1. In the crawler properties section
    • Enter the crawler name as fashion-clean-zone-crawler
    • Click Next

alt

  1. Select the data source and click Add a data source
    • Choose S3 then click Browse
    • Select the fashion-clean-zone bucket then click Add
    • Click Next

alt alt

  1. In the IAM role section
    • Select Create an IAM role
    • Enter the role name as AWSGlueServiceRole-FashionCrawlerRole
    • Click Next

alt

  1. In the Set output and scheduling section
    • Target database: Select the fashion-clean-zone database
    • Frequency: choose Daily and enter 17:00 (UTC is 00:00 Vietnam time)
    • Click Next

alt

  1. In the Review section
    • Click Finish

alt

  1. Select Run Crawler and wait for the crawler process to complete. This process will take about 1 minute.

Check Results with Athena

  1. Access the Athena service from the AWS Management Console. alt

  2. Click Launch Query Editor

alt

  1. In the Query Editor interface, click Settings and then Manage
    • Under Location of query result, click Browse S3 and select the fashion-logic-zone bucket
    • Click Save

alt

  1. Return to the Editor section
    • On the left, select Data Source as AwsDataCatalog
    • Select the Database as fashion-clean-zone
    • You can now enter SQL queries to query the data. For example:
    SELECT * FROM "fashion-clean-zone"."clickstreams" limit 10;

alt