Entry Snowflake Horizon Catalog knowledge utilizing catalog federation within the AWS Glue Knowledge Catalog


This can be a visitor put up by Andries Engelbrecht, Principal Companion Options Engineer at Snowflake, in partnership with AWS.

AWS introduced a brand new catalog federation characteristic that permits you to immediately entry knowledge from Snowflake Horizon Catalog by way of the AWS Glue Knowledge Catalog. This integration lets you uncover and question Horizon Catalog knowledge in Iceberg format by way of REST endpoints whereas making use of fine-grained entry controls utilizing AWS Lake Formation. The brand new catalog federation mixed with Snowflake’s catalog-linked database characteristic means customers can entry knowledge saved throughout AWS and Snowflake from a single level of entry, lowering knowledge motion and related prices by eliminating the necessity to duplicate knowledge throughout platforms.

On this put up, we present you the way to join the AWS Glue Knowledge Catalog to Snowflake Horizon Catalog and question the info utilizing AWS analytics companies. We cowl the way to arrange catalogs in Horizon Catalog and configure required permissions, create and configure the federation connection in AWS Glue, implement fine-grained entry controls utilizing AWS Lake Formation, and eventually, question federated tables utilizing Amazon Athena. This step-by-step strategy guides you thru the whole course of of creating a integration between your Snowflake and AWS knowledge environments.

Enterprise examples and key advantages

Catalog federation permits a number of important enterprise situations whereas delivering key operational and strategic advantages.

Widespread examples

This federation functionality addresses a number of key enterprise situations:

  • Ruled, cross-platform analytics: Question knowledge throughout AWS and Snowflake environments to enhance data-driven choice making with out knowledge motion or duplication
  • Knowledge mesh implementation: Allow safe and federated knowledge discovery whereas sustaining domain-oriented possession
  • Compliance administration: Implement constant entry controls and auditing throughout platforms

Key advantages

  • Operational effectivity: Eradicate knowledge duplication and scale back Extract Rework Load (ETL) workloads
  • Enhanced safety: Centralize entry management by way of AWS Lake Formation with fine-grained permissions
  • Value optimization: Reduce knowledge switch and storage prices throughout platforms
  • Improved agility: Allow sooner time to insights with direct question entry
  • Simplified governance: Preserve unified compliance and audit framework

Answer overview

The answer makes use of catalog federation within the AWS Glue Knowledge Catalog to combine with Snowflake Horizon Catalog. This integration helps each Snowflake Horizon, the place the catalog is inner to Snowflake, and exterior catalogs reminiscent of Apache Polaris, Snowflake Open Catalog (a managed service that hosts Apache Polaris), and others.

The next diagram illustrates how AWS Glue Knowledge Catalog federates with Snowflake Horizon Catalog, enabling prospects to immediately entry Iceberg-format knowledge managed by Snowflake Horizon Catalog by way of the Glue Knowledge Catalog.

The combination works by way of three major elements:

  1. Authentication: Makes use of OAuth2 credentials of Snowflake principal
  2. Entry Management: AWS Lake Formation manages fine-grained permissions
  3. Question Entry: AWS Analytics companies like Amazon Athena can immediately question the federated tables

Now, we stroll by way of the step-by-step means of establishing this integration.

Stipulations

Earlier than you start, verify you will have the next:

Configure Snowflake Horizon Catalog for Iceberg exterior entry

Snowflake Horizon Catalog already helps managing Iceberg tables. For this walkthrough, you have to create Snowflake-managed Iceberg tables with knowledge saved in Amazon S3.

Observe these steps so as:

  1. Create an exterior quantity for S3: First, create an exterior quantity that factors to your S3 bucket the place Iceberg desk knowledge is saved. Observe the directions in Create Exterior Quantity(s) for the Iceberg Tables on S3.
  2. Create a database: Create a database to arrange your tables. Consult with the Snowflake database creation documentation.
  3. Create a schema: Create a schema inside your database following the Snowflake schema creation information.
  4. Create an Iceberg desk: Create your Iceberg desk utilizing the exterior quantity. Observe the directions to Create Iceberg Desk.

After finishing these steps, your Snowflake-managed Iceberg tables are able to federate with AWS Glue Knowledge Catalog.

Configure entry management and authentication

To allow AWS Glue to entry your Snowflake-managed Iceberg tables, you have to configure entry management and acquire authentication credentials.

Step 1: Configure entry management

Create a devoted Snowflake position for exterior engine entry to ascertain clear governance boundaries. Observe the directions in Configure Entry Management for exterior engines and arrange the suitable permissions to your Iceberg tables.

Step 2: Acquire an entry token

Generate an entry token for authenticating AWS Glue to Snowflake Horizon Catalog. Snowflake helps three authentication mechanisms:

  • Exterior OAuth
  • Key-pair authentication
  • Programmatic Entry Token (PAT)

Select the authentication methodology that most closely fits your safety necessities and observe the corresponding Snowflake documentation to generate your credentials.

Catalog Federation helps OAuth or customized authentication. For particulars on utilizing OAuth discuss with Federate to Snowflake Iceberg Catalog.

For this put up, we use customized authentication and generate entry token utilizing PAT. Change role_name with the principal position and token_value with the principal’s Programmatic Entry Token.

curl --location 'https://.snowflakecomputing.com/polaris/api/catalog/v1/oauth/tokens' 
--header 'Content material-Kind: utility/x-www-form-urlencoded' 
--data-urlencode 'grant_type=client_credentials' 
--data-urlencode 'scope=session:position:' 
--data-urlencode 'client_secret='

Word down the entry token that’s generated.

Step 3: Allow catalog federation

With entry management configured and authentication credentials in hand, AWS Glue Catalog Federation can now hook up with and entry Snowflake’s Horizon Catalog.

Non-obligatory: Snowflake Open Catalog configuration

For those who choose to make use of Snowflake Open Catalog for Iceberg exterior entry as a substitute, discuss with Sync a Snowflake-managed desk with Snowflake Open Catalog for different setup directions.

Setup Glue Catalog federation with Snowflake Horizon Catalog

Create a secret on AWS Secrets and techniques Supervisor

Log in to AWS console utilizing the IAM position that has entry to AWS Secrets and techniques Supervisor. Open Secrets and techniques Supervisor:

  • Select Retailer a brand new secret and choose Different kind of secret for the key kind.
  • Set the key-value pair:
    • Key: BEARER_TOKEN
    • Worth: The entry token famous earlier
  • Select Subsequent and supply the key title as horizon-secret.
  • Full the setup by selecting Retailer.

Alternatively, you need to use the CLI to create the key by operating the next command.

Change your-access-token and your-region along with your precise values:

aws secretsmanager create-secret 
    --name horizon-secret 
    --description "Snowflake Horizon entry token" 
    --secret-string '{
        "BEARER_TOKEN": "your-access-token"
    }' 
    --region your-region

Create IAM position for catalog federation

Because the catalog proprietor of a federated catalog in AWS Glue Knowledge Catalog, you need to use Lake Formation to implement complete entry controls to your knowledge groups:

Entry management choices

You possibly can implement entry controls at completely different granularity ranges relying in your governance wants:

  • Coarse-grained: Desk-level permissions
  • Nice-grained: Column-level, row-level, and cell-level filtering
  • Tag-based: Dynamic entry based mostly on knowledge classification tags

Lake Formation requires an IAM position with permissions to entry the underlying S3 places of your exterior catalog.

Create an IAM position that permits the Glue Connection to entry AWS Secrets and techniques Supervisor, VPC configurations (elective) and Lake formation to handle credential merchandising for S3 bucket/prefix.

Required permissions

  1. Secrets and techniques Supervisor entry: The Glue connection requires permissions to retrieve secret values from Secrets and techniques Supervisor for OAuth tokens saved to your Snowflake service connection.
  2. Amazon Digital Non-public Cloud (VPC) Entry (elective): When utilizing VPC endpoints to limit connectivity to your Snowflake Open Catalog account, the Glue connection wants permissions to explain and use VPC community interfaces. This configuration ensures safe, managed entry to each your saved credentials and community assets whereas sustaining correct isolation by way of VPC endpoints.
  3. S3 bucket and AWS Key Administration Service (KMS) key permission: The Glue connection requires S3 permissions to learn certificates if used within the connection setup. Moreover, Lake Formation requires learn permissions on the bucket/prefix the place the distant catalog desk knowledge resides. If the info is encrypted utilizing a KMS key, further KMS permissions are required.

Setup steps:

Run the next command utilizing AWS CLI by changing the placeholder along with your setup info:

Create a JSON file (e.g., trust-policy.json) with the next construction:

{
    "Model": "2012-10-17",
    "Assertion": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": ["glue.amazonaws.com","lakeformation.amazonaws.com"]
            },
            "Motion": "sts:AssumeRole"
        }
    ]
}

Use the aws iam create-role command, referencing the belief coverage file:

aws iam create-role 
    --role-name LFDataAccessRole 
    --assume-role-policy-document file:///trust-policy.json 

First, create a JSON file (reminiscent of, permissions-policy.json) for the permissions:


{
"Model": "2012-10-17",
    "Assertion": [
        {
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue",
                "secretsmanager:DescribeSecret"
            ],
            "Useful resource": [
                ""
            ]
        },
        {
            "Impact": "Enable",
            "Motion": [
                "ec2:CreateNetworkInterface",
                "ec2:DeleteNetworkInterface",
                "ec2:DescribeNetworkInterfaces"
            ],
            "Useful resource": "*",
            "Situation": {
                "ArnEquals": {
                    "ec2:Vpc": "arn:aws:ec2:area:account-id:vpc/", 
                    "ec2:Subnet": [ 
                        "arn:aws:ec2:region:account-id:subnet/"
                    ]
                }
            }
        },
        {
           # Required when utilizing customized cert to signal requests.
            "Impact": "Enable",
            "Motion": [
                "s3:GetObject"
            ],
            "Useful resource": [
                "arn:aws:s3:::/"
            ]
        },
        { # Required when utilizing buyer managed encryption key for s3 
            "Impact": "Enable",
            "Motion": [
                "kms:decrypt",
                "kms:encrypt"
            ],
            "Useful resource": [
                ""
            ]
        }
    ]
}

Then, connect it to the position:

aws iam put-role-policy 
--role-name LFDataAccessRole 
--policy-name myaccesspolicies 
--policy-document file:///permissions- coverage.json

Create federated catalog in Glue Knowledge Catalog

AWS Glue helps the SNOWFLAKEICEBERGRESTCATALOG connection kind for connecting Glue Knowledge Catalog with Snowflake Horizon Catalog and Snowflake Open Catalog. This Glue connector helps OAuth2 authentication and contains further configuration parameters like CASING_TYPE to customise how AWS Glue Knowledge Catalog discovers metadata within the Snowflake Horizon Catalog accounts.

Log in to your AWS console as a knowledge lake admin and open the AWS Lake Formation console.

  1. Select Catalog within the left navigation pane and choose Create catalog.
  2. Select the info supply as Snowflake Horizon Catalog.

    AWS Lake Formation console screenshot showing Step 1 of catalog creation wizard with five federation type options, Snowflake Horizon Catalog selected.
  3. Present the next info:
    • Identify: Identify of the federated catalog in Glue Catalog. For this put up, we use federated_lakehousedb
    • Catalog title in Snowflake: Catalog title present in Snowflake Horizon Catalog, this could match precise title in Horizon catalog. For this put up, we use LAKEHOUSEDB
    • For Connection particulars, select New connection configurations:
      • Connection title: Identify for the glue connection. For this put up, we use federatedconnection1.
      • Workspace URL: Horizon IRC url (format: https://.snowflakecomputing.com)
      • Casing kind: select Uppercase solely
      • Authentication:
        • Authentication kind: select Customized. Alternatively, you possibly can choose OAuth2 authentication. For Customized authentication, an entry token is created, refreshed, and managed by the shopper’s utility or system and saved utilizing AWS Secrets and techniques Supervisor.
        • OAuth Secret: Present the key supervisor ARN that was created within the earlier step.
  • You probably have AWS PrivateLink setup and/or a proxy setup, you possibly can present community particulars below Settings for community configurations (elective).
  • For Register Glue reference to Lake Formation:
    • Select the IAM position created earlier(LFDataAccessRole) to handle knowledge entry utilizing Lake Formation.

To check the connection, select Run take a look at. After the connection info is validated, it reveals as profitable.

Green success banner displaying "Connection test successful" with checkmark icon, confirming valid AWS configuration.

Now you can create the catalog by choosing Create catalog.

Alternatively, you need to use AWS CLI to create connection and catalog utilizing instance instructions:

aws glue create-connection 
--connection-input '{
"Identify": "federatedconnection1",
"ConnectionType": "SNOWFLAKEICEBERGRESTCATALOG",
"ConnectionProperties": {
    "INSTANCE_URL": "",
    "ROLE_ARN": "< ARN_of_LFDataAccessRole>",
    "CATALOG_CASING_FILTER": "UPPERCASE_ONLY"
},
"AuthenticationConfiguration": {
    "AuthenticationType": "CUSTOM",
    "SecretArn": "arn:aws:secretsmanager:::secret:horizon-secret"
}
}' 
--region 
aws lakeformation register-resource 
    --resource-arn  
    --role-arn  
    --with-federation 
    --with-privileged-access 
    --region 
aws glue create-catalog 
    --name federated_lakehousedb 
    --catalog-input '{
    "FederatedCatalog": {
        "Identifier": "LAKEHOUSEDB",
        "ConnectionName": “federatedconnection1 "
    },
    "CreateTableDefaultPermissions": [],
    "CreateDatabaseDefaultPermissions": []
}'

After the catalog is created, the Horizon databases and tables are listed below the federated catalog.

You possibly can implement superb grained entry management on the tables by making use of row/column filter utilizing Lake Formation.

Question the info utilizing Athena question editor:

Open the Amazon Athena console and run the next question to entry the federated Horizon desk:

SELECT * FROM "public"."buyer" restrict 10;

Clear up

To scrub up your assets, full the next steps:

  1. Drop the Snowflake Database with Cascade.
  2. Drop Exterior Quantity created for Iceberg Tables on S3.
  3. Drop the assets in Glue Knowledge Catalog and Lake Formation created for this put up.
  4. Delete the IAM roles and S3 buckets used for this put up.
  5. Delete any VPC, KMS keys if used for this put up setup.

Conclusion

On this put up, we demonstrated the way to set up a safe connection between AWS Analytics companies and Snowflake Horizon Catalog, enabling you to entry your knowledge from a single related and ruled view. You realized the way to:

  • Configure catalog federation between AWS Glue Knowledge Catalog and Snowflake Horizon Catalog
  • Arrange OAuth2 authentication for safe entry
  • Grant entry to Iceberg desk in Snowflake Horizon Catalog utilizing AWS Lake Formation
  • Question federated tables utilizing Amazon Athena

You possibly can observe the identical steps to ascertain a safe reference to open-source catalog choices reminiscent of Snowflake Open Catalog, a managed service for Apache Iceberg. Bear in mind to scrub up any assets you created whereas following this tutorial to keep away from ongoing costs.

To additional discover this answer in your surroundings, take into account the next assets:

These assets may also help you to implement and optimize this integration sample to your particular use case. As you start this journey, keep in mind to begin small, validate your structure with take a look at knowledge, and progressively scale your implementation based mostly in your group’s wants. Keep tuned for future workshops and assets.


In regards to the authors

 

Andries Engelbrecht

Andries Engelbrecht

Andries is a Principal Companion Options Engineer at Snowflake working with AWS. He helps product and repair integrations, as nicely the event of joint options with AWS. Andries has over 25 years of expertise within the area of information and analytics.

Nidhi Gupta

Nidhi Gupta

Nidhi is a Senior Companion Options Architect at AWS, specializing in knowledge analytics and AI. She helps prospects and companions construct and optimize Snowflake workloads on AWS. Nidhi has intensive expertise main improvement, manufacturing releases and deployments, with give attention to Knowledge, AI, ML, generative AI, and Superior Analytics.

Srividya Parthasarathy

Srividya Parthasarathy

Srividya is a Senior Massive Knowledge Architect on the AWS Lake Formation workforce. She works with the product workforce and prospects to construct sturdy options and options for his or her analytical knowledge platform. She enjoys constructing knowledge mesh options and sharing them with the neighborhood.

Pratik Das

Pratik Das

Pratik is a Senior Product Supervisor with AWS Lake Formation. He’s keen about all issues knowledge and works with prospects to know their necessities and construct pleasant experiences. He has a background in constructing data-driven options and machine studying methods.

 

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles