Seamlessly Combine Knowledge on Google BigQuery and ClickHouse Cloud with AWS Glue

October 15, 2025

62

Migrating from Google Cloud’s BigQuery to ClickHouse Cloud on AWS permits companies to leverage the pace and effectivity of ClickHouse for real-time analytics whereas benefiting from AWS’s scalable and safe surroundings. This text gives a complete information to executing a direct knowledge migration utilizing AWS Glue ETL, highlighting the benefits and finest practices for a seamless transition.

AWS Glue ETL allows organizations to find, put together, and combine knowledge at scale with out the burden of managing infrastructure. With its built-in connectivity, Glue can seamlessly learn knowledge from Google Cloud’s BigQuery and write it to ClickHouse Cloud on AWS, eradicating the necessity for customized connectors or advanced integration scripts. Past connectivity, Glue additionally gives superior capabilities similar to a visible ETL authoring interface, automated job scheduling, and serverless scaling, permitting groups to design, monitor, and handle their pipelines extra effectively. Collectively, these options simplify knowledge integration, scale back latency, and ship important price financial savings, enabling sooner and extra dependable migrations.

Stipulations

Earlier than utilizing AWS Glue to combine knowledge into ClickHouse Cloud, you should first arrange the ClickHouse surroundings on AWS. This contains creating and configuring your ClickHouse Cloud on AWS, ensuring community entry and safety teams are correctly outlined, and verifying that the cluster endpoint is accessible. As soon as the ClickHouse surroundings is prepared, you possibly can leverage the AWS Glue built-in connector to seamlessly write knowledge into ClickHouse Cloud from sources similar to Google Cloud BigQuery. You’ll be able to observe the subsequent part to finish the setup.

Arrange ClickHouse Cloud on AWS
1. Comply with the ClickHouse official web site to arrange surroundings (bear in mind to permit distant entry within the config file if utilizing Clickhouse OSS)
  https://clickhouse.com/docs/get-started/quick-start
Subscribe the ClickHouse Glue market connector
1. Open Glue Connectors and select Go to AWS Market
2. On the record of AWS Glue market connectors, enter ClickHouse within the search bar. Then select ClickHouse Connector for AWS Glue
3. Select View buy choices on the fitting high of the view
4. Evaluate Phrases and Situations and select Settle for Phrases
5. Select Proceed to Configuration as soon as it’s enabled
6. On Comply with the seller’s directions half within the connector directions as under, select the connector enabling hyperlink at step 3

Configure AWS Glue ETL Job for ClickHouse Integration

AWS Glue allows direct migration by connecting with ClickHouse Cloud on AWS by built-in connectors, permitting for seamless ETL operations. Throughout the Glue console, customers can configure jobs to learn knowledge from S3 and write it on to ClickHouse Cloud. Utilizing AWS Glue Knowledge Catalog, knowledge in S3 may be listed for environment friendly processing, whereas Glue’s PySpark assist permits for advanced knowledge transformations, together with knowledge sort conversions, to assist compatibility with ClickHouse’s schema.

Open AWS Glue within the AWS Administration Console
1. Navigate to Knowledge Catalog and Connections
2. Create a brand new connection
Configure BigQuery Connection in Glue
1. Put together a Google Cloud BigQuery Setting
2. Create and Retailer Google Cloud Service Account Key (JSON format) in AWS Secret Supervisor, you will discover the small print in BigQuery connections.
3. The JSON Format content material instance is as following:
```
{
  "sort": "service_account",
  "project_id": "h*********g0",
  "private_key_id": "cc***************81",
  "private_key": "-----BEGIN PRIVATE KEY-----nMI***zEc=n-----END PRIVATE KEY-----n",
  "client_email": "clickhouse-sa@h*********g0.iam.gserviceaccount.com",
  "client_id": "1*********8",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robotic/v1/metadata/x509/clickhouse-sapercent40h*********g0.iam.gserviceaccount.com",
  "universe_domain": "googleapis.com"
}
```
  - sort: service_account.
  - project_id: The ID of the GCP mission.
  - private_key_id: A novel ID for the personal key inside the file.
  - private_key: The precise personal key.
  - client_email: The e-mail deal with of the service account.
  - client_id: A novel shopper ID related to the service account.
  - auth_uri, token_uri, auth_provider_x509_cert_url
  - client_x509_cert_url: URLs for authentication and token trade with Google’s id and entry administration methods.
  - universe_domain: The area identify of GCP, googleapis.com
4. Create Google BigQuery Connection in AWS Glue
5. Grant the IAM position related along with your AWS Glue job permission for S3, Secret Supervisor, Glue companies, and AmazonEC2ContainerRegistryReadOnly for accessing connectors bought from AWS Market (reference doc)
Create ClickHouse connection in AWS Glue
1. Enter clickhouse-connection as its connection identify
2. Select Create connection and activate connector

Create a Glue job

On the Connectors view as under, choose clickhouse-connection and select Create job
Enter bq_to_clickhouse as its job identify and configure gc_connector_role as its IAM Position
Configure BigQuery connection and clickhouse-connection to the Connection property
Select the Script tab and Edit script. Then select Verify on the Edit script popup view.
Copy and paste the next code onto the script editor which may be referred from clickhouse official doc

The supply code is as following:

import sys
from pyspark.sql import SparkSession
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv, ['JOB_NAME'])
spark = SparkSession.builder.getOrCreate()
glueContext = GlueContext(spark.sparkContext)
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

connection_options = {
    "connectionName": "Bigquery connection",
    "parentProject": "YOUR_GCP_PROJECT_ID",
    "question": "SELECT * FROM `YOUR_GCP_PROJECT_ID.bq_test_dataset.bq_test_table`",
    "viewsEnabled": "true",
    "materializationDataset": "bq_test_dataset"
}
jdbc_url = " jdbc:clickhouse://YOUR_CLICKHOUSE_CONNECTION.us-east-1.aws.clickhouse.cloud:8443/clickhouse_database?ssl=true "
username = "default"
password = "YOUR_PASSWORD"
question = "choose * from clickhouse_database.clickhouse_test_table"
# Add this earlier than writing to check connection
strive:
    # Learn from BigQuery with Glue Connection
    print("Studying knowledge from BigQuery...")
    GoogleBigQuery_node1742453400261 = glueContext.create_dynamic_frame.from_options(
        connection_type="bigquery",
        connection_options=connection_options,
        transformation_ctx="GoogleBigQuery_node1742453400261"
    )
    # Convert to DataFrame
    bq_df = GoogleBigQuery_node1742453400261.toDF()
    print("Present knowledge from BigQuery:")
    bq_df.present()
    
    # Write BigQuery Knowledge to Clickhouse with JDBC
    bq_df.write 
    .format("jdbc") 
    .choice("driver", 'com.clickhouse.jdbc.ClickHouseDriver') 
    .choice("url", jdbc_url) 
    .choice("consumer", username) 
    .choice("password", password) 
    .choice("dbtable", "clickhouse_test_table") 
    .mode("append") 
    .save()
    
    print("Write BigQuery Knowledge to ClickHouse efficiently")
    
    # Learn from Clickhouse with JDBC
    reaf_df = (spark.learn.format("jdbc")
    .choice("driver", 'com.clickhouse.jdbc.ClickHouseDriver')
    .choice("url", jdbc_url)
    .choice("consumer", username)
    .choice("password", password)
    .choice("question", question)
    .choice("ssl", "true")
    .load())
    
    print("Present Knowledge from ClickHouse:")
    reaf_df.present()
    
besides Exception as e:
    print(f"ClickHouse connection check failed: {str(e)}")
    increase e
lastly:
    job.commit()

Select Save and Run on the fitting high of the present view

Testing and Validation

Testing is essential to confirm knowledge accuracy and efficiency within the new surroundings. After the migration completes, run knowledge integrity checks to verify report counts and knowledge high quality in ClickHouse Cloud. Schema validation is important, as every knowledge area should align accurately with ClickHouse’s format. Operating efficiency benchmarks, similar to pattern queries, will assist confirm that ClickHouse’s setup delivers the specified pace and effectivity positive aspects.

The Schema and Knowledge in supply BigQuery and vacation spot Clickhouse
AWS Glue output logs

Clear Up

After finishing the migration, it’s necessary to scrub up unused assets—similar to BigQuery for pattern knowledge import and database assets in ClickHouse Cloud—to keep away from pointless prices. Concerning IAM permissions, adhering to the precept of least privilege is advisable. This entails granting customers and roles solely the permissions needed for his or her duties and eradicating pointless permissions when they’re now not required. This strategy enhances safety by minimizing potential risk surfaces. Moreover, reviewing AWS Glue job prices and configurations might help determine optimization alternatives for future migrations. Monitoring total prices and analyzing utilization can reveal areas the place code or configuration enhancements might result in price financial savings.

Conclusion

AWS Glue ETL provides a strong and user-friendly resolution for migrating knowledge from BigQuery to ClickHouse Cloud on AWS. By using Glue’s serverless structure, organizations can carry out knowledge migrations which can be environment friendly, safe, and cost-effective. The direct integration with ClickHouse streamlines knowledge switch, supporting excessive efficiency and suppleness. This migration strategy is especially well-suited for corporations seeking to improve their real-time analytics capabilities on AWS.

Seamlessly Combine Knowledge on Google BigQuery and ClickHouse Cloud with AWS Glue

Stipulations

Configure AWS Glue ETL Job for ClickHouse Integration

Testing and Validation

Clear Up

Conclusion

Concerning the Authors

Related Articles

Taco Sauce

How To Get Rid Of Again Zits Quick, In accordance To Pores and skin Specialists

The Rise of Emotional Surveillance

LEAVE A REPLY Cancel reply

Latest Articles

Taco Sauce

How To Get Rid Of Again Zits Quick, In accordance To Pores and skin Specialists

The Rise of Emotional Surveillance

SHOE REVIEW: Below Armour Velociti SPD

Grilled Rooster Salad