Organizations constructing information and AI purposes in Amazon SageMaker Unified Studio mix a number of AWS providers, together with AWS Glue, Amazon Athena, Amazon Managed Workflows for Apache Airflow (Amazon MWAA), Amazon SageMaker AI, and Amazon Fast Sight, into single purposes. Selling these purposes from improvement to check and manufacturing levels requires substituting service-specific configurations for every stage and provisioning sources within the appropriate order.
Information groups perceive which providers their purposes want however lack steady integration and steady supply (CI/CD) experience, whereas DevOps groups perceive deployment automation however should be taught every AWS service’s provisioning necessities.
The CI/CD CLI for Amazon SageMaker Unified Studio (aws-smus-cicd-cli) is an open supply command line device that automates deployment of multi-service information and AI purposes throughout pipeline levels. Information groups outline their utility as soon as in a YAML manifest, DevOps groups deploy with a single command, and the CLI handles configuration substitution, dependency ordering, and useful resource provisioning routinely. For particulars, see the CI/CD CLI documentation.
On this publish, we stroll by how the CI/CD CLI works, present you the way to deploy an actual utility throughout environments, and exhibit the way it suits into your current CI/CD workflows.
Buyer highlight
Bureau Veritas, a world chief in testing, inspection, and certification, operates throughout a number of SageMaker Unified Studio environments to help its information and AI groups. With their information and DevOps groups engaged on totally different components of the appliance lifecycle, Bureau Veritas wanted a managed solution to promote workloads from improvement by take a look at to manufacturing whereas preserving clear possession boundaries between the 2 groups.
“We have to promote information and AI purposes throughout SageMaker Unified Studio environments in a managed means that respects the boundaries between our information groups and our DevOps groups. The CI/CD CLI does precisely that — a single manifest from the info group, a single deploy command from DevOps, and full management over what goes to manufacturing.”
— Gilles Kempf, Structure Supervisor, Bureau Veritas
How the CI/CD CLI works
The CI/CD CLI introduces a clear separation of issues between information groups and DevOps groups.
Information groups outline what to deploy in a declarative YAML manifest (manifest.yaml). The manifest describes the appliance’s sources, together with AWS Glue extract, remodel, and cargo (ETL) jobs, Athena queries, Airflow directed acyclic graphs (DAGs), Fast Sight dashboards, and SageMaker coaching jobs, together with stage-specific configurations for every surroundings.
DevOps groups outline how and when to deploy utilizing their current CI/CD methods. They keep full management over their deployment methodology. They select whether or not to advertise content material by git branches, a bundle artifactory, or each; they resolve the form of the pipeline, together with which levels to incorporate (dev, staging, pre-prod, prod) and which handbook approvals or safety gates are required. They run aws-smus-cicd-cli deploy inside GitHub Actions, Jenkins, or GitLab CI workflows without having to grasp which AWS providers the appliance makes use of or how SageMaker Unified Studio initiatives are structured. The CLI is a utility for AWS analytics service deployment, not a CI/CD methodology. Your group’s current conventions for branches, approvals, and pipeline form keep precisely as they’re.
The CLI is the abstraction layer between the 2. It reads the manifest, substitutes stage-specific configurations (S3 paths, AWS Id and Entry Administration (IAM) roles, account IDs, and connection strings), provisions sources in dependency order, and handles all AWS service interactions.The next diagram illustrates this separation:
Key ideas
Utility manifest
Every stage maps to a devoted SageMaker Unified Studio venture. This one-stage-to-one-project mapping is the inspiration of CI/CD isolation: every venture has its personal area, IAM boundaries, connections, and information, so adjustments in dev can by no means have an effect on prod. For stronger isolation, initiatives can span totally different AWS accounts and AWS Areas. For instance, dev in a sandbox account and prod in a manufacturing account in a distinct Area. As a result of every stage is an actual SageMaker Unified Studio venture, groups can open it within the console at any time to look at workflows, examine sources, and troubleshoot deployments. Undertaking membership is managed per venture, so that you management precisely who has entry to every stage. For instance, builders in dev and a launch group in prod.The manifest file is the only supply of fact on your utility. It declares:
- Content material: utility code from git repositories, information information from S3, Fast Sight dashboards, and workflow definitions.
- Phases: environment-specific venture mappings (dev, take a look at, prod, and many others.), every remoted as described earlier.
- Configuration: stage-specific settings which might be substituted routinely at deploy time.
Right here is an instance manifest for an analytics utility with AWS Glue ETL and Fast Sight:
applicationName: SalesAnalyticsDashboard
Every stage should map to a separate SageMaker Unified Studio venture, offering full isolation between environments. The CLI substitutes variables like ${AWS_ACCOUNT_ID} and ${AWS_REGION} at deploy time primarily based on the goal surroundings.
Bundles
A bundle is an immutable, versioned archive of your utility. The bundle command reads from a supply stage (sometimes dev) and packages the appliance code, workflow definitions, and resolved configurations right into a self-contained artifact. The deploy command then applies that artifact to a number of goal levels (take a look at or prod).
This stage-to-bundle-to-stage promotion mannequin helps managed rollout by high quality gates:
The identical artifact is deployed at each stage with out rebuilding, offering audit trails and reproducible deployments for regulated industries.
SageMaker Catalog integration
The CLI manages Amazon SageMaker Catalog sources as a part of the deployment course of. You’ll be able to outline catalog property, glossaries, glossary phrases, kind sorts, asset sorts, and metadata varieties, in your manifest. Throughout deployment, the CLI searches for property within the catalog, creates subscription requests for required information entry, and waits for approval earlier than continuing. This automates the info governance workflow that groups beforehand dealt with manually.
CLI instructions
The CI/CD CLI gives instructions that cowl the complete deployment lifecycle:
| Command | Description |
| describe | Validates the manifest, checks that concentrate on initiatives exist, and confirms the execution function has required permissions. Use –hook up with validate towards stay AWS environments. |
| bundle | Reads from a supply stage and packages utility code, workflow definitions, and configurations into an immutable, versioned archive. |
| deploy | Applies bundle contents to a number of goal levels. Provisions sources in dependency order. |
| take a look at | Runs post-deployment validation to verify providers are working and prepared for workloads. |
| create | Generates a starter manifest from an current SageMaker Unified Studio venture. |
| run | Triggers Airflow workflow execution on MWAA or Airflow Serverless connections. |
| monitor | Displays workflow execution standing in actual time. |
| logs | Fetches and streams workflow execution logs. |
| destroy | Removes deployed sources and initiatives for cleanup or failure restoration. |
Walkthrough: deploying a Fast Sight dashboard with AWS Glue ETL
On this part, we stroll by deploying an analytics utility that makes use of AWS Glue for ETL, Athena for queries, and Fast Sight for dashboards. This instance is obtainable within the GitHub repository.
Use case
An analytics group owns a Gross sales Analytics Dashboard constructed on AWS Glue ETL, Athena, and Fast Sight. They wish to promote adjustments from a improvement surroundings to manufacturing with reproducible builds, automated validation, and a transparent approval gate between levels, with out writing customized deployment scripts or exposing information engineers to AWS provisioning particulars.
Answer overview
We use a pattern utility from the CI/CD CLI GitHub repository that features AWS Glue ETL scripts, an Airflow workflow definition, a Fast Sight dashboard bundle, and integration checks. A single manifest.yaml describes the appliance and its dev and prod levels. The CLI handles the complete lifecycle: bundle the app from dev, deploy it to check, run validation, and promote the identical immutable artifact to prod.
Conditions
Earlier than you start, be sure you have the next:
Answer structure
Every stage within the manifest maps to a devoted SageMaker Unified Studio venture (see the separation-of-concerns diagram in “How the CI/CD CLI works” earlier on this publish). At deploy time, the CLI uploads ETL scripts and workflow definitions to the venture’s S3 storage connection, provisions the Airflow workflow in MWAA Serverless, runs the workflow to create AWS Glue jobs and databases, and imports the Fast Sight dashboard. The identical bundle artifact is utilized to each downstream stage, making certain dev, take a look at, and prod keep in sync whereas remaining totally remoted.
Answer implementation
Step 1: Set up the CLI
Set up the CLI from PyPI:
Step 2: Create or customise a manifest
Clone the repository and begin from the analytics instance:
The instance consists of AWS Glue ETL scripts, an Airflow workflow definition, a Fast Sight dashboard bundle, and integration checks. Open manifest.yaml and replace the venture, area, and deployment_configuration values below every stage in order that they match your personal SageMaker Unified Studio initiatives and connection names.Alternatively, generate a manifest from an current venture: aws-smus-cicd-cli create --domain-id
Step 3: Validate your configuration
Run the describe command with --connect to confirm your surroundings is prepared. This connects to your AWS surroundings and validates that concentrate on initiatives exist, the execution function has the required permissions, and connections are reachable. Repair any points earlier than deploying.
Step 4: Deploy
Run the deployment:
Throughout deployment, the CLI:
- Uploads ETL scripts and workflow definitions to S3 utilizing the venture’s storage connection.
- Creates the Airflow workflow in MWAA Serverless.
- Runs the workflow, which provisions AWS Glue jobs, creates databases, and runs ETL transformations.
- Imports the Fast Sight dashboard and refreshes datasets with the most recent information.
- Processes any catalog asset subscriptions outlined within the manifest.
Step 5: Validate
Run post-deployment validation to verify providers are working and prepared for workloads:
Step 6: Promote to manufacturing
Promote the identical bundle artifact that was validated within the take a look at stage to manufacturing. This ensures the very same artifact runs in prod:
Integrating with GitHub Actions
The CLI works with current CI/CD options. The GitHub repository consists of reusable workflow templates that DevOps groups can undertake immediately.The next is an instance of a GitHub Actions workflow that implements a full bundle-based deployment pipeline:
The CLI additionally works with Jenkins, GitLab CI, and Azure DevOps. See the CI/CD integration information for extra examples.
Within the subsequent part, we cowl which AWS providers and workload sorts the CLI helps.
Supported workloads
The CLI deploys purposes that span the next AWS providers by Airflow workflow definitions:
- Analytics and BI: AWS Glue ETL jobs and crawlers, Amazon Athena queries, Amazon Fast Sight dashboards, Amazon EMR jobs, Amazon Redshift queries.
- Machine studying: SageMaker coaching jobs, ML mannequin endpoints, SageMaker AI Pipelines.
- Code and workflows: Jupyter notebooks, Python scripts, Airflow DAGs (MWAA and MWAA Serverless).
- Information and storage: S3 information information, Git repositories, SageMaker Catalog sources (glossaries, glossary phrases, kind sorts, asset sorts, property, information merchandise, metadata varieties).
The examples listing consists of working purposes for every of those patterns, with manifests, workflow definitions, and integration checks.
Failure restoration
If a deployment fails, the CLI stops on the level of failure and stories the error with an in depth stack hint. To get better:
- Run
aws-smus-cicd-cli describe --connectto test which sources exist and which permissions are lacking. - Repair the difficulty and rerun
aws-smus-cicd-cli deploy. - For bundle-based deployments, redeploy a earlier bundle model.
- Use
aws-smus-cicd-cli destroy --targetsto scrub up a failed deployment.--force
For detailed rollback procedures, see the Rollback Information.
Conclusion
On this publish, you realized how the Amazon SageMaker Unified Studio CI/CD CLI provides information and DevOps groups a clear separation of issues: information groups describe their utility as soon as in a YAML manifest, and DevOps groups deploy it with a single command by their current CI/CD pipelines. You noticed how levels map to remoted SageMaker Unified Studio initiatives (optionally spanning AWS accounts and Areas), how bundles present immutable, reproducible promotion by take a look at and manufacturing, and the way the CLI integrates with GitHub Actions, Jenkins, GitLab CI, and Azure DevOps. You additionally walked by deploying a Glue-and-Fast-Sight analytics utility from dev by to prod.
Get began
The CI/CD CLI is obtainable at no further price in all AWS Areas the place Amazon SageMaker Unified Studio is obtainable. You pay just for the underlying AWS sources provisioned throughout deployment.
Use the next steps to strive it out:
- Set up the CLI:
pip set up aws-smus-cicd-cli - Browse the instance purposes for analytics and ML patterns.
- Observe the CI/CD CLI documentation to deploy your first utility in 10 minutes.
- Evaluate the Admin Information for infrastructure setup.
For suggestions and bug stories, open a difficulty on the GitHub repository.
Concerning the authors
