Automate deployment of knowledge and AI purposes with Amazon SageMaker Unified Studio CI/CD CLI


Organizations constructing information and AI purposes in Amazon SageMaker Unified Studio mix a number of AWS providers, together with AWS Glue, Amazon Athena, Amazon Managed Workflows for Apache Airflow (Amazon MWAA), Amazon SageMaker AI, and Amazon Fast Sight, into single purposes. Selling these purposes from improvement to check and manufacturing levels requires substituting service-specific configurations for every stage and provisioning sources within the appropriate order.

Information groups perceive which providers their purposes want however lack steady integration and steady supply (CI/CD) experience, whereas DevOps groups perceive deployment automation however should be taught every AWS service’s provisioning necessities.

The CI/CD CLI for Amazon SageMaker Unified Studio (aws-smus-cicd-cli) is an open supply command line device that automates deployment of multi-service information and AI purposes throughout pipeline levels. Information groups outline their utility as soon as in a YAML manifest, DevOps groups deploy with a single command, and the CLI handles configuration substitution, dependency ordering, and useful resource provisioning routinely. For particulars, see the CI/CD CLI documentation.

On this publish, we stroll by how the CI/CD CLI works, present you the way to deploy an actual utility throughout environments, and exhibit the way it suits into your current CI/CD workflows.

Buyer highlight

Bureau Veritas, a world chief in testing, inspection, and certification, operates throughout a number of SageMaker Unified Studio environments to help its information and AI groups. With their information and DevOps groups engaged on totally different components of the appliance lifecycle, Bureau Veritas wanted a managed solution to promote workloads from improvement by take a look at to manufacturing whereas preserving clear possession boundaries between the 2 groups.

“We have to promote information and AI purposes throughout SageMaker Unified Studio environments in a managed means that respects the boundaries between our information groups and our DevOps groups. The CI/CD CLI does precisely that — a single manifest from the info group, a single deploy command from DevOps, and full management over what goes to manufacturing.”

— Gilles Kempf, Structure Supervisor, Bureau Veritas

How the CI/CD CLI works

The CI/CD CLI introduces a clear separation of issues between information groups and DevOps groups.

Information groups outline what to deploy in a declarative YAML manifest (manifest.yaml). The manifest describes the appliance’s sources, together with AWS Glue extract, remodel, and cargo (ETL) jobs, Athena queries, Airflow directed acyclic graphs (DAGs), Fast Sight dashboards, and SageMaker coaching jobs, together with stage-specific configurations for every surroundings.

DevOps groups outline how and when to deploy utilizing their current CI/CD methods. They keep full management over their deployment methodology. They select whether or not to advertise content material by git branches, a bundle artifactory, or each; they resolve the form of the pipeline, together with which levels to incorporate (dev, staging, pre-prod, prod) and which handbook approvals or safety gates are required. They run aws-smus-cicd-cli deploy inside GitHub Actions, Jenkins, or GitLab CI workflows without having to grasp which AWS providers the appliance makes use of or how SageMaker Unified Studio initiatives are structured. The CLI is a utility for AWS analytics service deployment, not a CI/CD methodology. Your group’s current conventions for branches, approvals, and pipeline form keep precisely as they’re.

The CLI is the abstraction layer between the 2. It reads the manifest, substitutes stage-specific configurations (S3 paths, AWS Id and Entry Administration (IAM) roles, account IDs, and connection strings), provisions sources in dependency order, and handles all AWS service interactions.The next diagram illustrates this separation:

Key ideas

Utility manifest

Every stage maps to a devoted SageMaker Unified Studio venture. This one-stage-to-one-project mapping is the inspiration of CI/CD isolation: every venture has its personal area, IAM boundaries, connections, and information, so adjustments in dev can by no means have an effect on prod. For stronger isolation, initiatives can span totally different AWS accounts and AWS Areas. For instance, dev in a sandbox account and prod in a manufacturing account in a distinct Area. As a result of every stage is an actual SageMaker Unified Studio venture, groups can open it within the console at any time to look at workflows, examine sources, and troubleshoot deployments. Undertaking membership is managed per venture, so that you management precisely who has entry to every stage. For instance, builders in dev and a launch group in prod.The manifest file is the only supply of fact on your utility. It declares:

  • Content material: utility code from git repositories, information information from S3, Fast Sight dashboards, and workflow definitions.
  • Phases: environment-specific venture mappings (dev, take a look at, prod, and many others.), every remoted as described earlier.
  • Configuration: stage-specific settings which might be substituted routinely at deploy time.

Right here is an instance manifest for an analytics utility with AWS Glue ETL and Fast Sight:

applicationName: SalesAnalyticsDashboard

content material: 
  storage: 
    - title: etl-code 
      embody: ["*.py"] 
    - title: workflows 
      embody: ["*.yaml"] 
  quicksight: 
    - title: SalesDashboard 
      sort: dashboard 
  workflows: 
    - workflowName: sales_etl_pipeline 
      connectionName: default.workflow_serverless 
 
levels: 
  dev: 
    area: 
      area: us-east-1 
    venture: 
      title: analytics-dev 
    deployment_configuration: 
      storage: 
        - title: etl-code 
          connectionName: default.s3_shared 
          targetDirectory: gross sales/bundle/etl 
        - title: workflows 
          connectionName: default.s3_shared 
          targetDirectory: gross sales/bundle/workflows 
 
  prod: 
    area: 
      area: us-west-2 
    venture: 
      title: analytics-prod 
    deployment_configuration: 
      storage: 
        - title: etl-code 
          connectionName: default.s3_shared 
          targetDirectory: gross sales/bundle/etl 
        - title: workflows 
          connectionName: default.s3_shared 
          targetDirectory: gross sales/bundle/workflows 
      quicksight: 
        property: 
          - title: SalesDashboard 
            house owners: 
              - arn:aws:quicksight:${AWS_REGION}:${AWS_ACCOUNT_ID}:consumer/default/Admin/* 

Every stage should map to a separate SageMaker Unified Studio venture, offering full isolation between environments. The CLI substitutes variables like ${AWS_ACCOUNT_ID} and ${AWS_REGION} at deploy time primarily based on the goal surroundings.

Bundles

A bundle is an immutable, versioned archive of your utility. The bundle command reads from a supply stage (sometimes dev) and packages the appliance code, workflow definitions, and resolved configurations right into a self-contained artifact. The deploy command then applies that artifact to a number of goal levels (take a look at or prod).

This stage-to-bundle-to-stage promotion mannequin helps managed rollout by high quality gates:

# Bundle from dev 
aws-smus-cicd-cli bundle --manifest manifest.yaml 
 
# Deploy to check 
aws-smus-cicd-cli deploy --manifest app.tar.gz --targets take a look at 
 
# Validate the take a look at deployment 
aws-smus-cicd-cli take a look at --manifest manifest.yaml --targets take a look at 
 
# Promote the identical bundle to prod 
aws-smus-cicd-cli deploy --manifest app.tar.gz --targets prod 

The identical artifact is deployed at each stage with out rebuilding, offering audit trails and reproducible deployments for regulated industries.

SageMaker Catalog integration

The CLI manages Amazon SageMaker Catalog sources as a part of the deployment course of. You’ll be able to outline catalog property, glossaries, glossary phrases, kind sorts, asset sorts, and metadata varieties, in your manifest. Throughout deployment, the CLI searches for property within the catalog, creates subscription requests for required information entry, and waits for approval earlier than continuing. This automates the info governance workflow that groups beforehand dealt with manually.

CLI instructions

The CI/CD CLI gives instructions that cowl the complete deployment lifecycle:

Command Description
describe Validates the manifest, checks that concentrate on initiatives exist, and confirms the execution function has required permissions. Use –hook up with validate towards stay AWS environments.
bundle Reads from a supply stage and packages utility code, workflow definitions, and configurations into an immutable, versioned archive.
deploy Applies bundle contents to a number of goal levels. Provisions sources in dependency order.
take a look at Runs post-deployment validation to verify providers are working and prepared for workloads.
create Generates a starter manifest from an current SageMaker Unified Studio venture.
run Triggers Airflow workflow execution on MWAA or Airflow Serverless connections.
monitor Displays workflow execution standing in actual time.
logs Fetches and streams workflow execution logs.
destroy Removes deployed sources and initiatives for cleanup or failure restoration.

Walkthrough: deploying a Fast Sight dashboard with AWS Glue ETL

On this part, we stroll by deploying an analytics utility that makes use of AWS Glue for ETL, Athena for queries, and Fast Sight for dashboards. This instance is obtainable within the GitHub repository.

Use case

An analytics group owns a Gross sales Analytics Dashboard constructed on AWS Glue ETL, Athena, and Fast Sight. They wish to promote adjustments from a improvement surroundings to manufacturing with reproducible builds, automated validation, and a transparent approval gate between levels, with out writing customized deployment scripts or exposing information engineers to AWS provisioning particulars.

Answer overview

We use a pattern utility from the CI/CD CLI GitHub repository that features AWS Glue ETL scripts, an Airflow workflow definition, a Fast Sight dashboard bundle, and integration checks. A single manifest.yaml describes the appliance and its dev and prod levels. The CLI handles the complete lifecycle: bundle the app from dev, deploy it to check, run validation, and promote the identical immutable artifact to prod.

Conditions

Earlier than you start, be sure you have the next:

Answer structure

Every stage within the manifest maps to a devoted SageMaker Unified Studio venture (see the separation-of-concerns diagram in “How the CI/CD CLI works” earlier on this publish). At deploy time, the CLI uploads ETL scripts and workflow definitions to the venture’s S3 storage connection, provisions the Airflow workflow in MWAA Serverless, runs the workflow to create AWS Glue jobs and databases, and imports the Fast Sight dashboard. The identical bundle artifact is utilized to each downstream stage, making certain dev, take a look at, and prod keep in sync whereas remaining totally remoted.

Answer implementation

Step 1: Set up the CLI

Set up the CLI from PyPI:

pip set up aws-smus-cicd-cli

Step 2: Create or customise a manifest

Clone the repository and begin from the analytics instance:

git clone https://github.com/aws/CICD-for-SageMakerUnifiedStudio.gitcd CICD-for-SageMakerUnifiedStudio/examples/analytic-workflow/dashboard-glue-quick

The instance consists of AWS Glue ETL scripts, an Airflow workflow definition, a Fast Sight dashboard bundle, and integration checks. Open manifest.yaml and replace the venture, area, and deployment_configuration values below every stage in order that they match your personal SageMaker Unified Studio initiatives and connection names.Alternatively, generate a manifest from an current venture: aws-smus-cicd-cli create --domain-id --dev-project-id

Step 3: Validate your configuration

Run the describe command with --connect to confirm your surroundings is prepared. This connects to your AWS surroundings and validates that concentrate on initiatives exist, the execution function has the required permissions, and connections are reachable. Repair any points earlier than deploying.

aws-smus-cicd-cli describe --manifest manifest.yaml --connect

Step 4: Deploy

Run the deployment:

aws-smus-cicd-cli deploy --targets take a look at --manifest manifest

Throughout deployment, the CLI:

  1. Uploads ETL scripts and workflow definitions to S3 utilizing the venture’s storage connection.
  2. Creates the Airflow workflow in MWAA Serverless.
  3. Runs the workflow, which provisions AWS Glue jobs, creates databases, and runs ETL transformations.
  4. Imports the Fast Sight dashboard and refreshes datasets with the most recent information.
  5. Processes any catalog asset subscriptions outlined within the manifest.

Step 5: Validate

Run post-deployment validation to verify providers are working and prepared for workloads:

aws-smus-cicd-cli take a look at --manifest manifest.yaml --targets take a look at

Step 6: Promote to manufacturing

Promote the identical bundle artifact that was validated within the take a look at stage to manufacturing. This ensures the very same artifact runs in prod:

# Promote the identical bundle that was validated in take a look at to prod

aws-smus-cicd-cli deploy --manifest app.tar.gz --targets prod

Integrating with GitHub Actions

The CLI works with current CI/CD options. The GitHub repository consists of reusable workflow templates that DevOps groups can undertake immediately.The next is an instance of a GitHub Actions workflow that implements a full bundle-based deployment pipeline:

title: Deploy Analytics Utility 
on: 
  push: 
    branches: [main] 
 
jobs: 
  deploy-test: 
    runs-on: ubuntu-latest 
    steps: 
      - makes use of: actions/checkout@v4 
 
      - title: Set up CLI 
        run: pip set up aws-smus-cicd-cli 
 
      - title: Configure AWS credentials 
        makes use of: aws-actions/configure-aws-credentials@v4 
        with: 
          role-to-assume: ${{ secrets and techniques.AWS_ROLE_ARN }} 
          aws-region: us-east-1 
 
      - title: Validate 
        run: aws-smus-cicd-cli describe --manifest manifest.yaml --connect 
 
      - title: Bundle 
        run: aws-smus-cicd-cli bundle --manifest manifest.yaml 
 
      - title: Deploy to check 
        run: aws-smus-cicd-cli deploy --targets take a look at --manifest manifest.yaml 
 
      - title: Run checks 
        run: aws-smus-cicd-cli take a look at --manifest manifest.yaml --targets take a look at 
 
  deploy-prod: 
    wants: deploy-test 
    runs-on: ubuntu-latest 
    surroundings: manufacturing 
    steps: 
      - makes use of: actions/checkout@v4 
 
      - title: Set up CLI 
        run: pip set up aws-smus-cicd-cli 
 
      - title: Configure AWS credentials 
        makes use of: aws-actions/configure-aws-credentials@v4 
        with: 
          role-to-assume: ${{ secrets and techniques.AWS_PROD_ROLE_ARN }} 
          aws-region: us-west-2 
 
      - title: Deploy to manufacturing 
        run: aws-smus-cicd-cli deploy --targets prod --manifest manifest.yaml

The CLI additionally works with Jenkins, GitLab CI, and Azure DevOps. See the CI/CD integration information for extra examples.

Within the subsequent part, we cowl which AWS providers and workload sorts the CLI helps.

Supported workloads

The CLI deploys purposes that span the next AWS providers by Airflow workflow definitions:

  • Analytics and BI: AWS Glue ETL jobs and crawlers, Amazon Athena queries, Amazon Fast Sight dashboards, Amazon EMR jobs, Amazon Redshift queries.
  • Machine studying: SageMaker coaching jobs, ML mannequin endpoints, SageMaker AI Pipelines.
  • Code and workflows: Jupyter notebooks, Python scripts, Airflow DAGs (MWAA and MWAA Serverless).
  • Information and storage: S3 information information, Git repositories, SageMaker Catalog sources (glossaries, glossary phrases, kind sorts, asset sorts, property, information merchandise, metadata varieties).

The examples listing consists of working purposes for every of those patterns, with manifests, workflow definitions, and integration checks.

Failure restoration

If a deployment fails, the CLI stops on the level of failure and stories the error with an in depth stack hint. To get better:

  1. Run aws-smus-cicd-cli describe --connect to test which sources exist and which permissions are lacking.
  2. Repair the difficulty and rerun aws-smus-cicd-cli deploy.
  3. For bundle-based deployments, redeploy a earlier bundle model.
  4. Use aws-smus-cicd-cli destroy --targets --force to scrub up a failed deployment.

For detailed rollback procedures, see the Rollback Information.

Conclusion

On this publish, you realized how the Amazon SageMaker Unified Studio CI/CD CLI provides information and DevOps groups a clear separation of issues: information groups describe their utility as soon as in a YAML manifest, and DevOps groups deploy it with a single command by their current CI/CD pipelines. You noticed how levels map to remoted SageMaker Unified Studio initiatives (optionally spanning AWS accounts and Areas), how bundles present immutable, reproducible promotion by take a look at and manufacturing, and the way the CLI integrates with GitHub Actions, Jenkins, GitLab CI, and Azure DevOps. You additionally walked by deploying a Glue-and-Fast-Sight analytics utility from dev by to prod.

Get began

The CI/CD CLI is obtainable at no further price in all AWS Areas the place Amazon SageMaker Unified Studio is obtainable. You pay just for the underlying AWS sources provisioned throughout deployment.

Use the next steps to strive it out:

  1. Set up the CLI:

    pip set up aws-smus-cicd-cli

  2. Browse the instance purposes for analytics and ML patterns.
  3. Observe the CI/CD CLI documentation to deploy your first utility in 10 minutes.
  4. Evaluate the Admin Information for infrastructure setup.

For suggestions and bug stories, open a difficulty on the GitHub repository.


Concerning the authors

Ramesh H Singh

Ramesh H Singh

Ramesh H Singh is a Senior Product Supervisor Technical (Exterior Providers) at AWS in Seattle, Washington, at the moment with the Amazon SageMaker group. He’s keen about constructing high-performance ML/AI and analytics merchandise that assist enterprise prospects obtain their crucial targets utilizing cutting-edge expertise.

Vasudevan Venkataramanan

Vasudevan Venkataramanan

Vasudevan Venkataramanan is a Senior Software program Engineer on the Amazon SageMaker Unified Studio group. He’s liable for technical route of scheduling and orchestration inside SageMaker Unified Studio. Exterior of his skilled work, he enjoys spending time along with his child, and enjoying pickleball and cricket.

Amir Bar Or

Amir Bar Or

Amir Bar Or is a Principal Engineer on the Amazon SageMaker Unified Studio group.

Nikita Arbuzov

Nikita Arbuzov

Nikita is Software program Engineer on the Amazon SageMaker Unified Studio group. He’s liable for constructing help for CI/CD options inside SageMaker Unified Studio.

Saurabh Bhutyani

Saurabh Bhutyani

Saurabh Bhutyani is a Principal Analytics Specialist Options Architect at AWS. He’s keen about new applied sciences. He joined AWS in 2019 and works with prospects to offer architectural steerage for working generative AI use circumstances, scalable analytics options and information mesh architectures utilizing AWS providers like Amazon Bedrock, Amazon SageMaker Unified Studio, Amazon EMR, Amazon Athena, AWS Glue, AWS Lake Formation, and Amazon DataZone.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles