In July 2025, Amazon SageMaker introduced help for Amazon Easy Storage Service (Amazon S3) basic goal buckets and prefixes in Amazon SageMaker Catalog that delivers fine-grained entry management and permissions by S3 Entry Grants. This integration addresses the problem knowledge groups face when manually managing knowledge discovery and Amazon S3 permissions as separate workflows. Information customers, corresponding to knowledge scientists, engineers, and enterprise analysts, can now uncover and entry S3 buckets or prefixes knowledge belongings by SageMaker Catalog, whereas directors can preserve granular entry controls utilizing S3 Entry Grants permissions.
Constructing upon current SageMaker help for structured knowledge in Amazon S3 Tables buckets, the added help for S3 basic goal buckets makes it simple for groups to seek out, entry, and collaborate on several types of knowledge, together with unstructured knowledge corresponding to paperwork, photographs, audio, and video, whereas offering entry administration. Information directors and knowledge stewards can now implement fine-grained entry permissions for a bucket or a prefix utilizing S3 Entry Grants, supporting safe and applicable knowledge utilization throughout their group.
On this publish, we discover how this integration addresses key challenges our prospects have shared with us, and the way knowledge producers, corresponding to directors and knowledge engineers, can seamlessly share and govern S3 buckets and prefixes utilizing S3 Entry Grants, whereas making it readily discoverable for knowledge customers. We stroll you thru a sensible instance of bringing Amazon S3 knowledge into your initiatives and implementing efficient governance for each analytics and generative AI workflows.
Challenges in working with unstructured knowledge
Organizations face challenges in maximizing the worth of their unstructured knowledge belongings. Though prospects wish to incorporate insights derived from unstructured knowledge for complete evaluation, they typically resort to constructing bespoke integrations to extract structured data from unstructured sources, resulting in inefficient and fragmented options. Three crucial roadblocks have traditionally hindered enterprises:
- Organizations battle to keep up a catalog that gives equal discoverability for each structured and unstructured knowledge, typically leading to separate techniques for various knowledge sorts.
- Information customers all through organizations wish to analyze unstructured knowledge utilizing acquainted instruments like notebooks, simply as they do with structured knowledge, however are compelled to make use of separate interfaces and workflows as an alternative.
- Working with unstructured knowledge lacks streamlined entry administration—customers who uncover related knowledge can’t readily request entry from homeowners, load data into analytics instruments, or collaborate with colleagues straight from the workspaces or initiatives.
Amazon S3 unstructured knowledge as a managed asset in Amazon SageMaker
SageMaker Catalog now helps S3 basic goal buckets. Information producers can publish S3 buckets and prefixes as S3 Object Assortment belongings, making these belongings searchable and discoverable. As managed S3 Object Assortment belongings in SageMaker Catalog, entry permissions are mechanically dealt with utilizing S3 Entry Grants when knowledge shopper groups subscribe to cataloged datasets, changing bespoke knowledge discovery and permission administration workflows. Information producers can add enterprise context to technical metadata, together with glossary phrases and descriptions. Information customers can search, evaluate, and request entry to knowledge belongings by a unified workflow. Groups can then collaborate in SageMaker initiatives, incorporating datasets and conducting evaluation whereas sustaining safety and governance requirements.The important thing advantages within the simplified discoverability and entry to S3 knowledge in SageMaker Catalog embrace:
- Seamless S3 knowledge integration – You should utilize current Amazon S3 knowledge in SageMaker with out migration or restructuring
- Enhanced cataloging and governance – SageMaker Catalog facilitates knowledge publishing, discovery, and subscription with enterprise metadata and safety controls
- Improved knowledge sharing – Cataloged Amazon S3 knowledge turns into discoverable organization-wide, accelerating insights and collaboration
- Self-service knowledge entry – SageMaker supplies instruments for knowledge preparation, ETL (extract, rework, and cargo), and connectivity from numerous sources, supporting quicker analytics and AI answer growth
With these advantages, you’ll be able to speed up time-to-insight and unlock the complete potential of organizational knowledge belongings throughout groups.
Buyer highlight
Throughout industries, the true energy of knowledge emerges when organizations can seamlessly join and analyze several types of data throughout their operations. Bayer, a number one pharmaceutical and biotechnology firm, has huge units of unstructured knowledge organized throughout a number of S3 buckets and prefixes.
“Bringing a brand new drug to market is extensively recognized throughout the business to be a prolonged and costly course of, typically taking 10–15 years and costing $1–2 billion on common, with a low total success fee starting from round 8% to 12%. SageMaker now permits us to simply uncover and securely entry knowledge, structured and unstructured, whereas sustaining governance controls utilizing S3 Entry Grants. With SageMaker Catalog, we now have a streamlined method to knowledge administration that allows us to mix datasets, each structured and unstructured, lowering analysis time and growing productiveness all through the drug growth lifecycle,” stated Avinash Erupaka, Principal Engineer Lead, Bayer Pharma Drug Innovation Platform.
Resolution overview
In life sciences organizations, unstructured and semi-structured knowledge information are prevalent in analysis, growth, bio-manufacturing, and diagnostics divisions. These would possibly embrace digital pathology photographs, genetic sequence knowledge, microwell plate readouts, analytical spectra, and chromatograms. Together with unstructured and semi-structured knowledge, knowledge engineers gather numerous enterprise metadata, together with research, challenge, laboratory protocol, and assay data, and operational metadata, together with algorithmic steps, compute duties, and course of outputs.Scientists and enterprise customers can use SageMaker Catalog seek for knowledge belongings utilizing key phrases which can be discovered within the related enterprise metadata and operational metadata which can be captured as metadata kinds. For instance, there could be searches for pattern ID, experiment ID, group, platform, file names, dates, or key phrases inside the experimental description. These searches return a listing of knowledge belongings which have affiliation with these key phrases, that are collections of S3 objects. Scientists and enterprise customers are given entry to these collections of S3 objects.Within the following sections, we stroll by the setup step-by-step. We use the instance of digital pathology photographs use case from the life sciences business to reveal how researchers uncover and get entry to S3 objects utilizing SageMaker.
Conditions
Should you’re new to SageMaker, check with the Amazon SageMaker Consumer Information to get began.
To observe together with this publish, check with Establishing Amazon SageMaker to arrange a site and create initiatives. This area setup and challenge creation is a prerequisite for the opposite duties in SageMaker.
Get knowledge prepared in Amazon S3
To retailer digital pathology photographs, create an S3 bucket (for instance, researchdatafordigitalpathology), create a folder (for instance, dpimages) beneath it, and add digital pathology photographs. Ideally, you should have a set of photographs beneath a given prefix, however for this instance, we have now chosen only one picture file (dp_cancer.jpg). For directions to create a bucket, check with Making a basic goal bucket.
Arrange an information producer challenge
For knowledge engineers, create a producer challenge in Amazon SageMaker Unified Studio to create digital pathology photographs as knowledge belongings. For extra particulars on the right way to create initiatives, check with Create a challenge. Add knowledge engineers as members of the initiatives. For directions so as to add members, check with Add challenge members.

Add an Amazon S3 location
So as to add the gathering of digital pathology photographs (to convey your personal S3 buckets), full the next steps:
- In SageMaker Unified Studio, go to the challenge the place you wish to add Amazon S3.
- Select Information within the navigation pane, then select the plus signal.
- On the Add knowledge web page, select Add S3 location, then select Subsequent.

To acquire the main points to create a connection, you’ll be able to select from two choices:
- Utilizing the challenge function:
- You, the challenge person, retrieves the challenge function and shares it with the AWS Administration Console admin.
- The admin opens the AWS Id and Entry Administration (IAM) console to replace the challenge function with permissions.
- The admin opens the Amazon S3 console and provides a CORS coverage to every bucket.
- Utilizing an entry function Amazon Useful resource Identify (ARN), which is required for cross-account:
- You, the challenge person, shares the challenge ID and challenge function with the admin and requests entry to the S3 bucket.
- The admin creates an entry function (or makes use of an current function) with permissions, provides a belief coverage to the challenge, and tags it with the challenge ID.
- The admin opens the Amazon S3 console and provides a CORS coverage to the bucket.
- The admin sends the Amazon S3 URI and entry function particulars again to you.
After you’ve got obligatory permissions configured for the Amazon S3 location and challenge function, proceed with the remaining steps.
- On the Add S3 location web page, enter the next particulars:
- Enter a reputation for the placement path.
- (Non-obligatory) Add an outline of the placement path.
- Use the S3 URI and AWS Area offered by your admin.
- In case your admin granted you entry utilizing an entry function as an alternative of the challenge function, enter the entry function ARN obtained out of your admin.
- Select Add S3 location.
For extra particulars, see Including Amazon S3 knowledge.

Publish knowledge to SageMaker Catalog to make it discoverable
After you add the Amazon S3 location, full the next steps to publish the info:
- In SageMaker Unified Studio, go to your challenge.
- Select Information within the navigation pane and select the Amazon S3 location.
- On the Actions dropdown menu, select Publish to Catalog.

After you publish the belongings, you’ll find the belongings on the Printed tab within the Property web page beneath Mission catalog within the navigation pane.

Create a shopper challenge
Create a shopper challenge for researchers to collaborate and convey obligatory belongings for his or her evaluation and add researchers as members to the challenge. Customers can seek for accessible (revealed) knowledge belongings on digital pathology photographs for most cancers analysis after which subscribe to work with it utilizing JupyterLab notebooks in SageMaker. For extra particulars on the right way to create initiatives, check with Create a challenge. For directions so as to add members, check with Add challenge members.

Discover related belongings and request entry
Researchers can search the SageMaker Catalog for accessible (revealed) knowledge belongings utilizing the string digitalpathology. Full the next steps:
- In SageMaker Unified Studio, on the Uncover dropdown menu, select Information Catalog.
- Discover the asset you wish to subscribe to by shopping or coming into the title of the asset into the search bar.

- Select Subscribe.

- Present the next data:
- The challenge to which you wish to subscribe the asset.
- A brief justification in your subscription request. This data is utilized by the info producer to validate the request to grant entry.
- Select Request.

After you’re accredited, the challenge shall be subscribed to the asset and entry is granted mechanically. To supply entry, SageMaker Catalog makes use of S3 Entry Grants to grant learn permission to the subscribing challenge for the particular S3 bucket or prefix.
To view the standing of the subscription request, go to the challenge with which you subscribed to the asset. Select Subscription requests within the navigation pane, then select the Outgoing requests tab. This web page lists the belongings to which the challenge has requested entry. You’ll be able to filter the checklist by the standing of the request.
Assessment and approve the subscription request
The info producer or engineer of the publishing challenge should obtain the request from the researcher and approve the request. After the request is accredited, the researcher can have entry to the objects for the S3 bucket (or prefix).

Earlier than approving, the info producer can view the main points of the subscription request to ensure they know who will get entry to the info they personal.

After they approve the request, the info producers can audit the completely different requests they’ve for the belongings they personal.

Entry the subscribed knowledge in notebooks
After the entry request is accredited, the researcher can open a JupyterLab pocket book from SageMaker Unified Studio and entry S3 objects to work on their analysis.To navigate to the JupyterLab pocket book, full the next steps:
- In SageMaker Unified Studio, open your challenge.
- On the Construct dropdown menu, select JupyterLab.
The next is pattern Python code to entry subscribed knowledge. This pattern code retrieves the S3 object that the researcher has been given entry to and makes use of Matplotlib (a complete 2D plotting library for Python language) to show the picture within the pocket book. In a real-world use case, a researcher usually makes use of these photographs for displaying or coaching machine studying fashions or performing multimodal evaluation.

SageMaker and S3 Entry Grants integrations
The SageMaker Catalog integration with S3 Entry Grants facilitates safe knowledge entry throughout Amazon EMR Serverless, AWS Glue, Amazon EMR on Amazon EC2, and JupyterLab notebooks by easy configuration settings. By enabling S3 Entry Grants with two properties ('fs.s3.s3AccessGrants.enabled': 'true' and 'fs.s3.s3AccessGrants.fallbackToIAM': 'true'), customers achieve streamlined entry management whereas sustaining IAM as a fallback choice. These configurations are automated in SageMaker Unified Studio. To be taught extra about S3 Entry Grants integrations, see S3 Entry Grants integrations, and for Boto3 S3 Entry Grants help, check with the next GitHub repo.
Conclusion
On this publish, we mentioned the added help for S3 basic goal buckets in SageMaker, and the way they are often cataloged in SageMaker Catalog to assist customers rapidly uncover and securely handle entry when sharing with different groups.
To be taught extra about SageMaker and the right way to get began, check with the Amazon SageMaker Consumer Information and Amazon S3 knowledge in Amazon SageMaker Unified Studio.
Concerning the authors
Priya Tiruthani is a Senior Technical Product Supervisor with Amazon DataZone at AWS. She focuses on bettering knowledge discovery and curation required for knowledge analytics. She is keen about constructing progressive merchandise to simplify prospects’ end-to-end knowledge journey, particularly round knowledge governance and analytics. Outdoors of labor, she enjoys being outside to hike, seize nature’s magnificence, and just lately play pickleball.
Subrat Das is a Principal Options Architect and a part of the World Healthcare and Life Sciences business division at AWS. He’s keen about modernizing and architecting advanced buyer workloads. When he’s not engaged on expertise options, he enjoys lengthy hikes and touring world wide.
Santhosh Padmanabhan is a Software program Improvement Supervisor at AWS, main the Amazon SageMaker Catalog engineering staff. His staff designs, builds, and operates providers specializing in knowledge, machine studying, and AI governance. With deep experience in constructing distributed knowledge techniques at scale, Santhosh performs a key function in advancing AWS’s knowledge governance capabilities.
Yuhang Huang is a Software program Improvement Supervisor on the Amazon SageMaker Unified Studio staff. He leads the engineering staff to design, construct, and function scheduling and orchestration capabilities in SageMaker Unified Studio. In his free time, he enjoys enjoying tennis.
