Processing giant volumes of knowledge effectively is vital for companies, and so knowledge engineers, knowledge scientists, and enterprise analysts want dependable and scalable methods to run knowledge processing workloads. The subsequent technology of Amazon SageMaker is the middle for all of your knowledge, analytics, and AI. Amazon SageMaker Unified Studio is a single knowledge and AI improvement atmosphere the place you could find and entry all the knowledge in your group and act on it utilizing the perfect instruments throughout any use case.
We’re excited to announce a brand new knowledge processing job expertise for Amazon SageMaker. Jobs are a standard idea extensively utilized in current AWS providers equivalent to Amazon EMR and AWS Glue. With this launch, now you can construct jobs in SageMaker to course of giant volumes of knowledge. Jobs may be constructed utilizing your most popular instrument. For instance, you possibly can create jobs from extract, remodel, and cargo (ETL) scripts coded within the Unified Studio code editor, code interactively in a Unified Studio Notebooks, or create jobs visually utilizing the Unified Studio Visible ETL editor. After being created, knowledge processing jobs may be set to run on demand, scheduled utilizing the inbuilt scheduler, or orchestrated with SageMaker workflows. You possibly can monitor the standing of your knowledge processing jobs and consider run historical past displaying standing, logs, and efficiency metrics. When jobs encounter failures, you need to use generative AI troubleshooting to routinely analyze errors and obtain detailed suggestions to resolve points rapidly. Collectively, you need to use these capabilities to creator, handle, function, and monitor knowledge processing workloads throughout your group. The brand new expertise offers an expertise that’s in step with different AWS analytics providers equivalent to AWS Glue.
This put up demonstrates how the brand new jobs expertise works in SageMaker Unified Studio.
Stipulations
To get began, you have to have the next stipulations in place:
- An AWS account
- A SageMaker Unified Studio area
- A SageMaker Unified Studio undertaking with an Knowledge analytics and AI-ML mannequin improvement undertaking profile
Instance use case
A worldwide attire ecommerce retailer processes 1000’s of buyer critiques day by day throughout a number of marketplaces. They should remodel their uncooked assessment knowledge into actionable insights to enhance their product choices and buyer expertise. Utilizing SageMaker Unified Studio visible ETL editor, we’ll show remodel uncooked assessment knowledge into structured analytical datasets that allow market-specific efficiency evaluation and product high quality monitoring.
Create and run a visible job
On this part, you’ll create a Visible ETL Job that processes the assessment knowledge from a Parquet file in Amazon Easy Storage Service Amazon S3. The job transforms the information utilizing SQL queries and saves the outcomes again to S3 buckets. Full the next steps to create a Visible ETL Job:
- On the SageMaker Unified Studio console, on the highest menu, select Construct.
- Beneath DATA ANALYSIS & INTEGRATION, select Knowledge processing jobs.
- Select Create Visible ETL Job.
You’ll be directed to the Visible ETL editor, the place you possibly can create ETL jobs. You should use this editor to design knowledge transformation pipelines by connecting supply nodes, transformation nodes, and goal nodes.
- On the highest left, select the plus (+) icon within the circle. Beneath Knowledge sources, choose Amazon S3.
- Choose the Amazon S3 supply node and enter the next values:
- S3 URI:
s3://aws-bigdata-blog/generated_synthetic_reviews/knowledge/product_category=Attire/ - Format: Parquet
- S3 URI:
- Choose Replace node.
- Select the plus (+) icon within the circle to the fitting of the Amazon S3 supply node. Beneath Transforms, choose SQL question.
- Enter the next question assertion and choose Replace node.
- Select the plus (+) icon to the fitting of the SQL Question node. Beneath Knowledge goal, choose Amazon S3.
- Choose the Amazon S3 goal node and enter the next values:
- S3 URI: Select the Amazon S3 location from the undertaking overview web page and add the suffix “
/output/rating_analysis/”. For instance,s3:/// / /output/rating_analysis/ - Format: Parquet
- Compression: Snappy
- Partition keys: review_date
- Mode: Append
- S3 URI: Select the Amazon S3 location from the undertaking overview web page and add the suffix “
- Choose Replace node.

Subsequent, add one other SQL question node related to the identical Amazon S3 knowledge supply. This node performs a SQL question transformations and outputs the outcomes to a separate S3 location.
- On the highest left, select the plus (+) icon within the circle. Beneath Transforms, choose SQL question, and join the Amazon S3 supply node.
- Enter the next question assertion and choose Replace node.
- Select the plus (+) icon to the fitting of the SQL Question node. Beneath Knowledge goal, choose Amazon S3.
- Choose the Amazon S3 goal node and enter the next values:
- S3 URI: Select the Amazon S3 location from the undertaking overview web page and add suffix “
/output/product_analysis/”. For instance,s3:/// / /output/product_analysis/ - Format: Parquet
- Compression: Snappy
- Partition keys: market
- Mode: Append
- S3 URI: Select the Amazon S3 location from the undertaking overview web page and add suffix “
- Choose Replace node.
At this level, your end-to-end visible job ought to seem like the next picture. The subsequent step is to save lots of this job to the undertaking and run the job.

- On the highest proper, select Save to undertaking to save lots of the draft job. You possibly can optionally change the identify and add an outline.
- Select Save.
- On the highest proper, select Run.
It will begin working your Visible ETL job. You possibly can monitor the checklist of job runs by deciding on View runs within the prime center of the display.

Create and run a code primarily based job
Along with creating jobs via the Visible ETL Editor, you possibly can create jobs utilizing a code-based method by specifying Python script or Pocket book information. While you specify a Pocket book file, it routinely converts to a Python script to create the job. Right here, you’ll create a pocket book in JupyterLab inside SageMaker Unified Studio, put it aside to the undertaking repository, after which create a code-based job from that pocket book. First, create a Pocket book.
- On the SageMaker Unified Studio console, on the highest menu, select Construct.
- Beneath IDE & APPLICATIONS, choose JupyterLab.
- Choose Python 3 underneath Pocket book.

- For the primary cell, choose Native Python, python, enter following code:
- For the second cell, choose PySpark, undertaking.spark.compatibility, enter following code. This performs the identical processing because the Visible ETL job you created above. Exchange the S3 bucket and folder names for output_path.
- Select the File icon to save lots of the pocket book file. Enter the identify of your pocket book.

Save the pocket book to the undertaking’s repository.
- Select the Git icon within the left navigation. This opens a panel the place you possibly can view the commit historical past and carry out Git operations.
- Select the plus (+) icon subsequent to the information you wish to commit.
- Enter a short abstract of the commit within the Abstract textual content entry subject. Optionally, enter an extended description of the commit within the Description textual content entry subject.
- Select Commit.
- Select the Push dedicated adjustments icon to do a git push.

Create the Code-based Job from the Pocket book file within the undertaking repository.
- On the SageMaker Unified Studio console, on the highest menu, select Construct.
- Beneath DATA ANALYSIS & INTEGRATION, select Knowledge processing jobs.
- Select Create job from information.
- Select Select undertaking information and select Browse information.
- Choose the Pocket book file you created and select Choose.
Right here, the Python script routinely transformed out of your pocket book file shall be displayed. Evaluate the content material.

- Select Subsequent.
- For Job identify, enter the identify of your job.
- Select Submit to create your job.
- Select the job you created.
- Select Run job.
Convert current Visible ETL flows to jobs
You possibly can convert an current visible ETL circulate to a job by saving your current Visible ETL circulate to the undertaking repository. Use the next steps to create a job out of your current visible ETL circulate:
- On the SageMaker Unified Studio console, on the highest menu, select Construct.
- Beneath DATA ANALYSIS & INTEGRATION, choose Visible ETL editor.
- Choose the current Visible ETL circulate.
- On the highest proper, select Save to undertaking to save lots of the draft circulate. You possibly can optionally change the identify and add an outline.
- Select Save.
View jobs
You possibly can view the checklist of jobs in your undertaking on the Knowledge processing jobs web page. Jobs may be filtered by mode (Visible ETL or Code).

Monitor job runs
On every job’s element web page, you possibly can view an inventory of job runs within the Job runs tab. You possibly can filter actions by job run ID, standing, begin time, and finish time. The Job runs checklist reveals fundamental attributes equivalent to length, assets consumed, and occasion sort, together with log group names and varied job parameters. You possibly can checklist, examine, and discover job runs historical past primarily based on varied attributes.

On the person job run particulars web page, you possibly can view job properties and output logs from the run. When a job fails due to an error, you possibly can see the error message on the prime of the web page and study detailed error info within the output logs.

Clever troubleshooting with generative AI: When jobs fail, you possibly can reap the benefits of generative AI troubleshooting to resolve points rapidly. SageMaker Unified Studio’s AI-powered troubleshooting routinely analyzes job metadata, Spark occasion logs, error stack traces, and runtime metrics to establish root causes and supply actionable options. It handles each easy eventualities like lacking S3 buckets, and sophisticated efficiency points equivalent to out-of-memory exceptions. The evaluation explains not simply what failed, however why it failed and repair it, lowering troubleshooting time from hours or days to minutes.
To begin the evaluation, selecting Troubleshoot with AI on the prime proper. The troubleshooting evaluation offers Root Trigger Evaluation figuring out the particular challenge, Evaluation Insights explaining the error context and failure patterns, and Suggestions with step-by-step remediation actions. This expert-level evaluation makes advanced Spark debugging accessible to all group members, no matter their Spark experience.

Clear up
To keep away from incurring future expenses, delete the assets you created throughout this walkthrough:
- Delete Visible ETL flows in Visible ETL editor.
- Delete Knowledge processing jobs, together with Visible ETL and Code-based jobs.
- Delete Output information within the S3 bucket.
Conclusion
On this put up, we explored the brand new job expertise in Amazon SageMaker Unified Studio, which brings a well-known and constant expertise for knowledge processing and knowledge integration duties. This new functionality streamlines your workflows by offering enhanced visibility, price administration, and seamless migration paths from AWS Glue.With the power to create each visible and code-based jobs, monitor job runs, and arrange scheduling, the brand new jobs expertise helps you construct and handle knowledge processing and knowledge integration duties effectively. Whether or not you’re a knowledge engineer engaged on ETL processes or a knowledge scientist getting ready datasets for machine studying, the job expertise in SageMaker Unified Studio offers the instruments you want in a unified atmosphere.Begin exploring the brand new job expertise right this moment to simplify your knowledge processing workflows and take advantage of your knowledge in Amazon SageMaker Unified Studio.
Concerning the authors
Chiho Sugimoto is a Cloud Assist Engineer on the AWS Massive Knowledge Assist group. She is enthusiastic about serving to prospects construct knowledge lakes utilizing ETL workloads. She loves planetary science and enjoys learning the asteroid Ryugu on weekends.
Noritaka Sekiyama is a Principal Massive Knowledge Architect on the AWS Analytics product group. He’s accountable for designing new options in AWS merchandise, constructing software program artifacts, and offering structure steering to prospects. In his spare time, he enjoys biking on his street bike.
Matt Su is a Senior Product Supervisor on the AWS Glue group. He enjoys serving to prospects uncover insights and make higher choices utilizing their knowledge with AWS Analytics providers. In his spare time, he enjoys snowboarding and gardening.
