Introducing the HubSpot connector for AWS Glue

December 3, 2024

237

Most corporations have adopted a various set of software program as a service (SaaS) platforms to assist varied purposes. The speedy adoption has enabled them to rapidly streamline operations, improve collaboration, and acquire extra accessible, scalable options for managing their essential information and workflows.

Extra corporations have realized there is a chance to combine, improve, and current this SaaS information to enhance inner operations and acquire precious insights on their information. Utilizing AWS Glue, a serverless information integration service, corporations can streamline this course of, integrating information from inner and exterior sources right into a centralized AWS information lake. From there, they’ll carry out significant analytics, acquire precious insights, and optionally push enriched information again to exterior SaaS platforms.

This submit introduces the new HubSpot managed connector for AWS Glue, and demonstrates how one can combine HubSpot information into your current information lake on AWS. By consolidating HubSpot information with information out of your AWS accounts and from different SaaS companies, you’ll be able to improve, analyze, and optionally write the info again to HubSpot, making a seamless and built-in information expertise.

Resolution overview

On this instance, we use AWS Glue to extract, remodel, and cargo (ETL) information out of your HubSpot account right into a transactional information lake on Amazon Easy Storage Service (Amazon S3), utilizing Apache Iceberg format. We register the schema within the AWS Glue Information Catalog to make your information discoverable. Subsequently, we use Amazon Athena to validate that the HubSpot information has been efficiently loaded to Amazon S3. The next diagram illustrates the answer structure.

The next are key parts and steps within the integration:

Configure your HubSpot account and app to allow entry to your HubSpot information.
Put together for information motion by securely storing your HubSpot OAuth credentials in AWS Secrets and techniques Supervisor, creating an S3 bucket to retailer your ingested information, and creating an AWS Identification and Entry Administration (IAM) position for AWS Glue.
Create an AWS Glue job to extract and cargo information from HubSpot to Amazon S3. AWS Glue establishes a safe connection to HubSpot utilizing OAuth for authorization and TLS for information encryption in transit. AWS Glue additionally helps the power to use complicated information transformations, enabling environment friendly information integration and preparation to fulfill your wants.
Schema and different metadata will likely be registered within the AWS Glue Information Catalog, a centralized metadata repository for all of your information property. This helps simplify schema administration, and likewise makes the info discoverable by different companies.
Run the AWS Glue job to extract information from HubSpot and write it to Amazon S3 utilizing Iceberg format. Apache Iceberg is an open supply, high-performance open desk format designed for large-scale analytics, offering transactional consistency and seamless schema evolution. Though we use Iceberg on this instance, AWS Glue provides strong assist for varied information codecs, together with different transactional codecs corresponding to Apache Hudi and Delta Lake.
The info loaded to Amazon S3 will likely be organized into partitioned folders to optimize for question efficiency and administration. Amazon S3 will even retailer the AWS Glue scripts, logs, and different momentary information required in the course of the ETL course of.
Lastly, Amazon Athena will likely be used to question the info loaded from HubSpot to Amazon S3, validating that every one modifications within the supply system have been captured efficiently.
Optionally, HubSpot can often synchronize HubSpot information to Amazon S3 and analyze information updates over time.

Arrange your HubSpot account

This instance requires you to create a HubSpot public app for AWS Glue in a HubSpot Developer account, and join it to an related HubSpot account. A HubSpot public app is a kind of integration that may be put in in your HubSpot accounts or listed within the HubSpot Market. On this instance, you create a HubSpot app for the AWS Glue integration, and set up it in a brand new take a look at account. Though HubSpot calls it a public app, it is not going to be listed of their Market and can solely have entry to your take a look at account.

When you don’t have already got one, join a free HubSpot developer account.
Log in to your HubSpot developer account, the place you’ll see choices to create apps and take a look at accounts.
Select Create a take a look at account and comply with the directions.

HubSpot take a look at accounts have Enterprise variations of the HubSpot Advertising and marketing, Gross sales, and Service Hubs together with pattern information, so you’ll be able to take a look at most HubSpot instruments, create CRM information, and entry it by APIs with Glue. For extra details about making a take a look at account, consult with Create a developer take a look at account.

Create a HubSpot app

Full the next steps to create a HubSpot app:

Change again to your HubSpot developer account, and select Create an app.
Fill within the App Data part with the identify AWS Glue and a quick description.
Select the Auth tab.
For Redirect URLs, enter the redirect URL for AWS Glue within the type: https://.console.aws.amazon.com/gluestudio/oauth.

Be sure you substitute together with your AWS Glue working AWS Area. As an example, the code for the US East (N. Virginia) Area is us-east-1, so the AWS Glue redirect URL is https://us-east-1.console.aws.amazon.com/gluestudio/oauth.

Within the Scopes part, select Add new scope and choose the next permissions:
- automation
- content material
- crm.lists.learn
- crm.lists.write
- crm.objects.corporations.learn
- crm.objects.corporations.write
- crm.objects.contacts.learn
- crm.objects.contacts.write
- crm.objects.customized.learn
- crm.objects.customized.write
- crm.objects.offers.learn
- crm.objects.offers.write
- crm.objects.homeowners.learn
- crm.schemas.customized.learn
- e-commerce
- varieties
- oauth
- sales-email-read
- tickets
Overview the Scopes and Redirect URL settings, then select Create app.
Navigate again to your app Auth tab.
Be aware of the values for Consumer ID, Consumer secret, and Set up URL (OAuth). You will want these later to attach your AWS Glue occasion.

Choose or create an Amazon S3 bucket the place your HubSpot information will reside

Choose an current Amazon S3 bucket in your account, or create a brand new bucket to retailer your HubSpot information, in addition to scripts, logs, and so forth. For this instance, the bucket identify will comply with the format aws-glue-hubspot--, the place is the AWS account quantity and is the working Area. The account will likely be configured with all defaults: public entry disabled, versioning disabled, and server-side encryption with Amazon S3 managed keys (SSE-S3).

When you use AWSGlueServiceRole in your IAM position as proven on this instance, it’s going to present entry to S3 buckets with names beginning with aws-glue-.

Create an IAM position for AWS Glue

Create an IAM position with permissions for the AWS Glue job. AWS Glue will assume this position when calling different companies in your behalf.

On the IAM console, select Roles within the navigation pane.
Select Create position.
For Trusted entity kind¸ select AWS service.
For Use case, select Glue.
Add the next AWS managed insurance policies to the position:
1. AWSGlueServiceRole for accessing associated companies corresponding to Amazon S3, Amazon Elastic Compute Cloud, Amazon CloudWatch, and IAM. This coverage allows entry to S3 buckets with names beginning with aws-glue-.
2. SecretsManagerReadWrite for learn/write entry to AWS Secrets and techniques Supervisor.
Give the position a reputation, as an example AWSGlueServiceRole_blog.

For extra data, see Getting began with AWS Glue and Create an IAM position for AWS Glue.

Create a AWS Secrets and techniques Supervisor secret

AWS Secrets and techniques Supervisor is used to securely retailer your HubSpot OAuth credentials. Full the next steps to create a secret:

On the AWS Secrets and techniques Supervisor console, select Secrets and techniques within the navigation pane.
Select Retailer a brand new secret.
For Secret kind, choose Different kind of secret.
Beneath Kay/worth pairs, enter the HubSpot shopper secret with the important thing USER_MANAGED_CLIENT_APPLICATION_CLIENT_SECRET.
Select Subsequent.

Enter the key identify, corresponding to HubSpot-Weblog, an outline, and proceed.
Go away the key rotation as default, and select Subsequent.
Overview the key configuration, and select Retailer.

Create an AWS Glue connection

Full the next steps to create an AWS Glue connection to your HubSpot account:

On the AWS Glue console, select Information connections within the navigation pane.
Select Create connection.
For Information sources, seek for and choose HubSpot.
Select Subsequent.

On the Configure connection web page, fill within the required data:
1. For IAM service position, select the service position created beforehand. On this instance, we use the position AWSGlueServiceRole_blog.
2. For Authentication URL, depart as default.
3. For Person Managed Consumer Software ClientId, enter the OAuth shopper ID from HubSpot.
4. For AWS Secret, select the OAuth shopper secret identify configured beforehand in AWS Secrets and techniques Supervisor.
5. Select Subsequent.

Select Take a look at Connection to validate the connection to HubSpot.
This can convey up a brand new HubSpot connection window. Be sure you choose your HubSpot take a look at account (not your developer account) to check the connection.
If that is your first connection try, you can be redirected to a different web page the place you might be requested to verify the entry degree granted to AWS Glue. Select Join App.

If profitable, the HubSpot window will shut and your AWS connection window will say Connection take a look at profitable.

Beneath Set properties, for Identify, enter a reputation (for instance, HubSpot_Connection_blog).
Select Subsequent.
Beneath Overview and create, evaluate your settings after which create the connection.

Create a database in AWS Glue Information Catalog

Full the next steps to create a database in AWS Glue Information Catalog to arrange your HubSpot information:

On the AWS Glue console, select Databases within the navigation pane.
Create a brand new database.
Enter a reputation (for instance, hubspot).
You may depart the situation subject clean.
Select Create database.

Create an AWS Glue ETL job

Now that you’ve got an AWS Glue information connection to your HubSpot account, you’ll be able to create an AWS Glue ETL job to ingest HubSpot information into your AWS information lake. AWS Glue offers each visible and code-based interfaces to simplify information integration, relying in your experience. On this instance, we use the Script interface to ingest HubSpot information into the Amazon S3 location. Full the next steps:

On the AWS Glue console, select ETL jobs within the navigation pane.
Select the Script editor.
Select Spark because the engine, and add the next script.

The AWS Glue Spark job reads the HubSpot information and merges it into the S3 bucket in Iceberg format.

On the Job particulars tab, present the next data:
For Identify, enter a reputation, corresponding to HubSpot_to_S3_blog.
For Description, enter a significant description of the job.
For IAM Function, select the IAM position you created beforehand (for this submit, AWSGlueServiceRole_blog).

Broaden Superior properties.
Beneath Connections, enter your HubSpot connection from the earlier part (for this submit, HubSpot_Connection_blog).

Beneath Job parameters, enter the next parameters:

- For --conf, enter spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO --conf spark.sql.catalog.glue_catalog.warehouse=file:///tmp/spark-warehouse
- For --datalake-formats, enter iceberg
- For --db_name, enter the AWS Glue database to retailer your information lake (for this submit, hubspot)
- For --table_name, enter the HubSpot desk to be ingested (for this submit, firm)
- For --s3_bucket_name, enter the place the ingested Iceberg desk is saved, on this case aws-glue-hubspot--
- For --connection_name, enter the AWS Glue connection identify created, on this case HubSpot_Connection_blog

Select Save to save lots of the job, then select Run.

Relying on the quantity of information in your HubSpot account, the job can take a couple of minutes to finish. After a profitable job run, you’ll be able to select Run particulars to see the job specs and logs.

Use Athena to question information

Athena is an interactive and serverless question service that makes it simple to research information instantly in Amazon S3 utilizing normal SQL. On this instance, we question the outcomes of the HubSpot information ingested into Amazon S3.

On the Athena console, select Question editor.
For Database, select hubspot, and you need to see your firm desk.
Choose entries from the hubspot.firm desk to view the info captured from hubspot.

You may strive varied queries on the HubSpot information, corresponding to:

-- get pattern of dataset
SELECT * FROM "hubspot"."firm" restrict 10;

-- get corporations income
SELECT * FROM "hubspot"."firm" A
WHERE A.annualrevenue IS NOT NULL;

-- get variety of corporations with income
SELECT COUNT(*) AS companies_count FROM "hubspot"."firm" A
WHERE A.annualrevenue IS NOT NULL;

Over time, your HubSpot information might change. You may rerun your ETL job periodically, and the Iceberg information lake desk will successfully seize your modifications. You may confirm by including, eradicating, and altering corporations in your HubSpot database, after which rerun the ETL job. Your information lake ought to match your newest HubSpot information. With this functionality, you’ll be able to schedule the ETL job to run as typically as you want.

Extending the HubSpot connector with AWS companies

The HubSpot connector for AWS Glue offers a strong basis for constructing complete information pipelines and analytics workflows. By integrating HubSpot information into your AWS setting, you need to use further companies like Amazon Redshift, Amazon QuickSight, and Amazon SageMaker to additional course of, remodel, and analyze the info. This lets you assemble subtle, end-to-end information architectures that unlock the complete worth of your HubSpot information, with out the necessity to handle complicated infrastructure. The seamless integration between these AWS companies makes it simple to construct scalable analytics pipelines tailor-made to your particular necessities.

Concerns

You may arrange AWS Glue job triggers to run the ETL jobs on a schedule, in order that the info is often synchronized between HubSpot and Amazon S3. You can too combine the ETL jobs with different AWS companies, together with AWS Step Capabilities, Amazon MWAA (Amazon Managed Workflows for Apache Airflow), AWS Lambda, Amazon EventBridge , and Amazon Bedrock to create a extra superior information processing pipeline.

By default, the HubSpot connector doesn’t import deleted data. Nonetheless, you’ll be able to set the IMPORT_DELETED_RECORDS choice to true to import all data, together with the deleted ones.

Clear up

To keep away from incurring fees, clear up the sources used on this submit out of your AWS account, together with the AWS Glue jobs, HubSpot connection, AWS Secrets and techniques Supervisor secret, IAM position, and Amazon S3 bucket.

Conclusion

With the introduction of the AWS Glue connector for HubSpot, integrating HubSpot information with data from different information sources has turn out to be extra streamlined than ever. This function allows you to arrange ongoing information integration from HubSpot to AWS, offering a unified view of information from throughout platforms and enabling extra complete analytics. The serverless nature of AWS Glue means there is no such thing as a infrastructure administration required, and also you solely pay for the sources consumed. By following the steps outlined on this submit, you’ll be able to be sure that up-to-date information from HubSpot is captured within the your information lake, permitting groups to make sooner data-driven choices and uncover complicated insights from throughout information sources.

To be taught extra concerning the AWS Glue connector for HubSpot, consult with Connecting to HubSpot in AWS Glue. This information walks by the whole course of, from establishing the connection to working the info switch stream. For extra data on AWS Glue, go to AWS Glue.

In regards to the Authors

Eric Bomarsi is a Senior Options Architect within the ISV group at AWS, the place he focuses on constructing scalable options for giant prospects. As a member of the AWS analytics neighborhood, he helps prospects get strategic insights from their information. Outdoors of labor, he enjoys taking part in ice hockey and touring together with his household.

Annie Nelson is a Senior Options Architect at AWS. She is an information fanatic who enjoys drawback fixing and tackling complicated architectural challenges with prospects.

Kartikay Khator is a Options Architect inside World Life Sciences at AWS, the place he dedicates his efforts to growing revolutionary and scalable options that cater to the evolving wants of shoppers. His experience lies in harnessing the capabilities of AWS analytics companies. Extending past his skilled pursuits, he finds pleasure and achievement on the planet of working and climbing. Having already accomplished a number of marathons, he’s at the moment getting ready for his subsequent marathon problem.

Kamen Sharlandjiev is a Sr. Huge Information and ETL Options Architect, Amazon MWAA and AWS Glue ETL professional. He’s on a mission to make life simpler for patrons who’re going through complicated information integration and orchestration challenges. His secret weapon? Totally managed AWS companies that may get the job completed with minimal effort. Comply with Kamen on LinkedIn to maintain updated with the most recent Amazon MWAA and AWS Glue options and information!