You might need knowledge in Amazon Easy Storage Service (Amazon S3) buckets in several AWS Areas that you really want accessible in a single Amazon OpenSearch Service area or assortment. Consolidating knowledge throughout Areas supplies unified analytics and searches, cut back operation complexity, and streamline your search infrastructure. We’re joyful to announce that Amazon OpenSearch Ingestion pipelines can now learn from S3 buckets in several Areas to ingest and consolidate knowledge right into a single OpenSearch Service area or assortment.
To consolidate this knowledge throughout AWS Areas, you beforehand had to supply your individual resolution. Now Amazon OpenSearch Ingestion may also help you accomplish this. On this publish, I’ll present you how one can use the brand new cross-Area assist to ingest knowledge from S3 buckets throughout a number of AWS Areas right into a single OpenSearch Service area or assortment.
Amazon OpenSearch Ingestion (OSI) is a feature-rich knowledge ingestion pipeline that you need to use for a lot of totally different functions: observability, analytics, and zero-ETL search. Many purchasers use OpenSearch Ingestion to ingest knowledge from Amazon S3 into OpenSearch Service domains and Amazon OpenSearch Serverless collections. Till now, you may solely ingest from a single AWS Area at a time. Now that you need to use OpenSearch Ingestion for cross-Area S3 ingestion, I’ll present you ways you need to use it in two eventualities: batch processing utilizing S3 scan, and streaming ingestion utilizing Amazon Easy Queue Service (Amazon SQS) queues for AWS vended logs like Amazon Digital Personal Cloud (Amazon VPC) Movement Logs and AWS CloudTrail.
Stipulations
Full the next prerequisite steps:
- Deploy an OpenSearch Service area or OpenSearch Serverless assortment within the Areas the place you need to carry out your search or analytics.
- You want S3 buckets in not less than two totally different Areas. You need to use present ones or create S3 buckets. You need to use one in the identical AWS Area as your OpenSearch Service area or assortment, or use two fully totally different Areas.
- Add objects with knowledge into your S3 buckets. The information could be JSON, ND-JSON, Parquet, CSV, or plaintext codecs.
- Configure AWS Identification and Entry Administration (IAM) permissions wanted for OSI. For directions, see Amazon S3 as a supply.
- For cross-Area ingestion, you have to now additionally embrace the s3:GetBucketLocation permission. This offers the pipeline the flexibility to find out which AWS Area the bucket is situated in.
After you full these steps, you may both arrange your Amazon OpenSearch Ingestion pipelines for batch or streaming eventualities. Within the following sections, I’ll provide you with suggestions on when to decide on which method, and I define the steps for creating your pipeline.
Batch eventualities
You need to use the OpenSearch Ingestion S3 scan functionality to learn batch knowledge from S3. You may discover this method helpful when your knowledge is written to S3 on a schedule. To carry out a cross-Area S3 scan, you solely specify the buckets that you just’re studying from while you create the OpenSearch Ingestion pipeline.
The next diagram exhibits the design for an OpenSearch Ingestion pipeline in us-west-2 studying from S3 buckets in us-east-1 and eu-west-1 and writing that knowledge into an OpenSearch Service area in us-west-2.
Subsequent, you’ll create an OpenSearch Ingestion pipeline. You should create this pipeline in the identical Area as your OpenSearch Service area or assortment.
The earlier pipeline configuration helps the JSON codec. You may need to configure a unique codec in case your knowledge isn’t a big JSON object.
Now you can question your OpenSearch Service area or assortment to see the information that you just ingested.
Streaming eventualities: AWS vended logs
Like lots of our prospects, you may need to ingest S3 knowledge from totally different AWS Areas into OpenSearch Service. A standard purpose is to consolidate AWS vended logs. For instance, VPC Movement Logs, CloudTrail knowledge, and cargo balancer logs. For these eventualities, you may configure OpenSearch Ingestion pipelines to learn from an Amazon SQS queue to stream knowledge into your OpenSearch Service area or assortment.
These AWS vended logs write to Amazon S3 in the identical AWS Area because the service working it. For instance, VPC Movement Logs shall be in the identical AWS Area as your Amazon VPC. You need to use OpenSearch Ingestion to consolidate these logs into one AWS Area. Within the VPC Movement Logs instance, you may consolidate your VPC Movement Logs from a number of AWS Areas right into a single OpenSearch Service area or assortment to research community patterns out of your totally different Amazon VPCs.
The next diagram outlines the general setup. It exhibits an instance of sending AWS vended logs from us-east-1 and eu-west-1 to an OpenSearch Service area in us-west-2. You may change the AWS Areas relying in your particular wants.

- You should configure your vended logs to jot down log occasions to Amazon S3 buckets of their respective AWS Areas. Utilizing VPC Movement Logs as our instance, you may configure VPC Movement Logs on your VPCs.
- Create an Amazon SQS queue in the identical AWS Area as your OpenSearch Service area.
- Amazon S3 doesn’t ship notifications to cross-Area Amazon SQS queues, so you’ll use intermediate Amazon Easy Notification Service (Amazon SNS) subjects to consolidate the notifications from a number of Areas into one queue. For every S3 bucket, create an SNS subject.
- Configure S3 Occasion Notifications for SNS. You’ll do that for every S3 bucket and every SNS subject.
- SNS can ship cross-Area notifications to SQS. Create a subscription from every SNS subject that you just created in step 3 to the only SQS queue you created in step 2.
- Configure your pipeline function to learn from SQS and skim from the related S3 buckets.
Now create an OpenSearch Ingestion pipeline in the identical AWS Area as your OpenSearch Service area.
The earlier pipeline configuration helps the JSON codec. You may need to configure a unique codec in case your knowledge shouldn’t be a big JSON object.
Subsequent, add objects with knowledge into your S3 buckets. By importing knowledge, S3 will ship notifications to SNS after which the SQS queue.
Now you can question your OpenSearch Service area or assortment to see the information that you just ingested.
Here’s what makes this potential and what’s totally different. The SQS queue receives the occasion notifications for the buckets. Earlier than the cross-Area characteristic of OpenSearch Ingestion, the pipeline may see these occasions, however couldn’t entry the S3 bucket even when the permissions had been granted. Now, the pipeline will decide the AWS Area that the bucket is in, entry an AWS Safety Token Service (AWS STS) token for the AWS Area of the bucket. Utilizing the STS token from the identical Area because the S3 bucket permits the pipeline to learn and entry the information.
Utilizing the AWS Console
Whenever you create the pipeline utilizing the OpenSearch Ingestion console, you’ll have choices to pick out a blueprint on your use-case. These blueprints enable you create pipelines for numerous vended log sorts solely by deciding on your SQS queue and OpenSearch area. The blueprint handles the information kind mappings for you by together with applicable processors. You need to use these blueprints as a place to begin and modify your processors on your particular necessities.
Clear up assets
Whenever you’re executed testing this out, use the next assets to delete the assets that you just created.
If you happen to arrange a batch pipeline:
- Delete the OpenSearch Ingestion pipeline.
If you happen to arrange a streaming pipeline:
For each pipelines, these steps enable you delete the frequent assets.
Conclusion
On this publish, I confirmed you ways you need to use Amazon OpenSearch Ingestion to ingest knowledge from Amazon S3 buckets in several AWS Areas. I confirmed that this works for each batch scan and streaming eventualities. The characteristic provides you an easy option to consolidate your knowledge from different Areas into one OpenSearch Service area or assortment.
To get began with the cross-Area S3 supply, discuss with the OpenSearch Ingestion documentation or strive making a pipeline from certainly one of our blueprints utilizing the OpenSearch Ingestion console. You may learn in regards to the codecs that OpenSearch Ingestion provides for parsing your S3 objects. You may as well learn the way in regards to the numerous processors that OpenSearch Ingestion provides, so you may rework and enrich your knowledge to satisfy your wants.
You may as well use OpenSearch Ingestion for cross-Area and cross-account. To do that, you have to grant cross-account permissions in your S3 bucket. You should additionally make some adjustments to your pipeline configuration. Combining what I confirmed you on this publish with the present cross-account options tremendously expands your ingestion choices.
If you happen to’re able to take your streaming ingestion analytics to the subsequent stage you may examine how one can generate metrics from logs and even how one can ship these derived metrics to Amazon Managed Service for Prometheus.
Have you ever tried out the cross-Area capabilities of OpenSearch Ingestion? Share your use-cases and questions within the feedback.
In regards to the authors
