Amazon Elastic Compute Cloud (Amazon EC2) Spot Situations supply important price financial savings of as much as 90% in comparison with On-Demand pricing, making them engaging for cost-conscious workloads. Nevertheless, when utilizing Spot Situations inside AWS Auto Scaling Teams (ASGs), their unpredictable interruptions create operational challenges. With out correct visibility into interruption patterns, groups battle to optimize capability planning, implement efficient fallback mechanisms, and make knowledgeable choices about workload placement throughout availability zones and occasion sorts.
This problem will be addressed by way of a customized event-driven monitoring and analytics dashboard that gives close to real-time visibility into Spot Occasion interruptions particularly for ASG-managed situations. For the rest of this doc, we’ll consult with this tradition answer as “Spot Interruption Insights” for Auto Scaling Teams.
On this put up, you’ll discover ways to construct this complete monitoring answer step-by-step. You’ll achieve sensible expertise designing an event-driven pipeline, implementing knowledge processing workflows, and creating insightful dashboards that show you how to observe interruption developments, optimize ASG configurations, and enhance the resilience of your Spot Occasion workloads.
Answer overview
The structure makes use of an event-driven method using AWS native companies for strong spot occasion interruption monitoring.
The answer makes use of Amazon EventBridge to seize interruption occasions, Amazon Easy Queue Service (Amazon SQS) for dependable message queuing, AWS Lambda for knowledge processing, and Amazon OpenSearch Service for storage and visualization of interruption patterns.
- EC2 Spot interruption notices are captured by way of an Amazon EventBridge rule.
- The notices are routed to an SQS queue for dependable message dealing with.
- A Lambda operate processes the occasions, fetching EC2 occasion metadata and AWS Auto Scaling Group (ASG) particulars by making optimized batch calls to the EC2 and Auto Scaling APIs. This design minimizes throttling dangers on the management aircraft APIs, making certain scalability. The Lambda operate is configured with batching and concurrency limits to stop overwhelming the API endpoints and the OpenSearch Service bulk indexing course of.
- After processing, occasions are bulk-indexed into Amazon OpenSearch Service, enabling close to real-time visibility and analytics.
A Lifeless Letter Queue (DLQ) ensures no knowledge is misplaced in case of failures, whereas AWS Identification and Entry Administration (IAM) roles implement least-privilege entry between all elements.
The OpenSearch Service area is deployed throughout the non-public subnets of an Amazon VPC, making certain it’s not publicly accessible.
- Entry to OpenSearch Dashboards is routed by way of an Utility Load Balancer (ALB) configured with an HTTPS listener,
- ALB forwards site visitors to an NGINX proxy working on EC2 situations in an Auto Scaling group. This setup gives safe and scalable entry.
- Authentication and authorization are enforced utilizing OpenSearch Service’s inside consumer database, making certain that solely approved customers can entry the dashboards.
OpenSearch Dashboards visualize interruption metrics, delivering actionable insights to assist efficient capability planning and workload placement.
Extensibility and various analytics instruments
Whereas this answer makes use of Amazon OpenSearch Service for storing and visualizing Spot Interruption knowledge, the structure is versatile and will be prolonged to assist different analytics and observability platforms. You’ll be able to modify the Lambda operate to ahead knowledge to instruments resembling Amazon Fast Sight, Amazon Timestream, Amazon Redshift, or exterior companies relying in your analytics and compliance wants. This permits groups to make use of their most well-liked tooling for constructing visualizations, setting alerts, or integrating with current dashboards.
What you’ll construct
By the top of this put up, you’ll have an entire Spot Interruption monitoring system as seen within the following screenshot that mechanically captures EC2 Spot Occasion interruption occasions out of your Auto Scaling Teams and presents them by way of interactive dashboards. Your answer will embody real-time visualizations displaying interruption patterns by availability zone, occasion sorts, and time durations, together with ASG-specific metrics that show you how to determine optimization alternatives.

The sections of this put up stroll you thru the step-by-step implementation of this answer, from deployment to organising the event-driven structure to configuring the analytics dashboards. Keep in mind you could deploy and customise this answer in your atmosphere.
Stipulations
You should have entry to an AWS account with sufficient privileges to create and handle the AWS assets mentioned on this weblog put up.You should even have the next software program/elements put in in your machine:
Notice: This software makes use of a number of AWS companies, and there are related prices past the Free Tier utilization. Check with the AWS Pricing web page for particular particulars. You might be accountable for any incurred AWS prices. This instance answer doesn’t indicate any guarantee.
Deployment directions
Create a brand new listing, navigate to that listing in a terminal and clone the GitHub repository:
Change listing to the answer listing:
Guidelines for deployment
This part lists the setup and configurations which might be required earlier than you deploy the answer stack by utilizing AWS SAM.
Should you don’t have a VPC, Subnets, NAT Gateway already created and configured you may comply with the steps talked about in the Amazon VPC documentation to create the required assets.
- VPC Created – Guarantee a VPC exists with DNS hostnames and DNS decision enabled. You will have the VPC ID throughout deployment
- Public Subnets (2 or extra) – Configure two or extra public subnet IDs from totally different Availability Zones.
- Personal Subnets (2 or extra) – Configure two or extra non-public subnet IDs from totally different Availability Zones.
- Outbound Web Entry for Personal Subnets – Guarantee NAT Gateway entry as nginx proxy might be put in on EC2 occasion in non-public subnet. Check with Instance: VPC with servers in non-public subnets and NAT for extra info on organising NAT for situations in non-public subnets.
- ALB Entry – CIDR IP vary allowed to entry ALB (resembling,
`1.2.3.4/32`). That is for accessing the dashboard. - Certificates ARN for ALB HTTPS Listener – To configure HTTPS listener. Certificates (will be self-signed) for HTTPS port of the load balancer. Check with Stipulations for importing ACM certificates for extra info on importing self-signed certificates into AWS Certificates Supervisor (ACM)
- OpenSearch Service-Linked Position – Earlier than deploying this template, make sure the AWS OpenSearch service-linked position exists in your account by working:
Notice:
- This command solely must be run as soon as per AWS account.
- If the position already exists, you’ll see an error message that may be safely ignored.
- This position permits Amazon OpenSearch Service to handle community interfaces in your VPC.
- With out this position, deployments that place OpenSearch Service domains in a VPC will fail with the error: “Earlier than you may proceed, you should allow a service-linked position to provide Amazon OpenSearch Service permissions to entry your VPC.”
- The service-linked position is called
"AWSServiceRoleForAmazonOpenSearchService"and is managed by AWS.
- AMIId – Legitimate EC2 AMI ID for the area. Notice:- This answer is designed to work solely with AMIs that use the DNF bundle supervisor. Use the newest Amazon Linux 2023 AMI for optimum compatibility and safety.
The next AMIs are confirmed suitable with this answer:
- Amazon Linux 2023
- Fedora (35 and newer)
- RHEL 8 and newer
- CentOS Stream 8 and newer
- Oracle Linux 8 and newer
Construct and deploy the answer – From the command line, use AWS SAM to construct and deploy the AWS assets as specified within the template.yml file.
Through the prompts: Fill-out the next parameters:
- Stack Title: {Enter your most well-liked stack identify}
- AWS Area: {Enter your most well-liked area code}
- Parameter DomainName: {Enter the identify in your new OpenSearch Service area the place the index might be created and knowledge might be pushed for analytics. This may create a brand new OpenSearch area with the identify you specify – Ideally preserve brief area identify}
- MasterUsername: {Admin username to login to the OpenSearch dashboard}
- MasterUserPassword: { Should comprise lowercase, uppercase, numbers, and particular characters (!@#$%^&*). Minimal 12 characters really useful. Keep away from frequent passwords (Password123!, Admin@2024 and extra) as these could trigger deployment failures on account of safety validation checks.}
- IndexName: {OpenSearch Index identify the place Spot interrupted occasion associated knowledge might be pushed}
- EventRuleName: {Amazon EventBridge rule identify to seize EC2 Spot interruption notices}
- CustomEventRuleName: {Amazon EventBridge customized rule identify to seize EC2 Spot interruption notices. This might be used for verifying the answer}
- TargetQueueName: {EventBridge Rule goal SQS identify}
- SQSDLQQueueName: {Goal SQS Lifeless Letter Queue identify}
- LambdaDLQQueueName: {Lambda Lifeless Letter Queue identify}
- VPCId: {Enter the VPCId the place the assets might be deployed}
- PublicSubnetIds: {Enter 2 or extra Public SubnetIDs separated by comma}
- PrivateSubnetIds: {Enter 2 or extra Personal SubnetIDs separated by comma}
- RestrictedIPCidr: {IP handle/CIDR for limiting ALB entry in CIDR format (resembling
10.2.3.4/32)} - CertificateArn: {Certificates ARN for configuring ALB HTTPS Listener}
- AMIId: {Legitimate EC2 AMI ID for the area}
- Verify modifications earlier than deploy: Y
- Permit SAM CLI IAM position creation: Y
- Disable rollback: N
- Save arguments to configuration file: Y
- SAM configuration file: {Press enter to make use of default identify}
- SAM configuration atmosphere: {Press enter to make use of default identify}
Notice: The entire answer could take roughly 15-20 minutes to deploy. After the deployment is full, there are just a few handbook steps that must be carried out to make sure the answer features as anticipated.
Publish deployment directions
The next steps must be carried out in OpenSearch Dashboards after logging in. Get the DNS Title of the Utility Load Balancer endpoint from the deployment output part of the CloudFormation stack or the ALB console. Entry the OpenSearch dashboards utilizing the ALB DNS identify as follows –
You’ll be redirected to the OpenSearch Dashboards login web page. Log in utilizing the MasterUsername and MasterUserPassword you specified throughout deployment.
If that is the primary time you might be logging in then you might even see a Welcome display.
- Select ‘Discover alone’ on the Welcome display.
- Select ‘Dismiss’ on the following display.
- If the ‘Choose your tenant’ dialog seems with ‘International’ preselected, Select ‘Verify’. In any other case, choose ‘International’ first after which and select ‘Verify’.
Create index and attribute mapping
This part lists the required steps to create the index and attribute mapping.
- On the Residence display choose the Hamburger Menu icon (
) on the highest left - Choose ‘Dev Instruments’ on the backside of the menu.
- On the dev instruments console, paste the next PUT command and execute the request by selecting ‘Click on to ship request’.
Notice The index identify ought to match what you entered in the course of the deployment. Change the index identify accordingly earlier than creating the index.
The next is a screenshot of this command in Dev Instruments.

- Verify that the index was created efficiently.

Create index sample
This part lists the required steps to create the index sample
- Entry the Hamburger Menu icon on the highest left.
- Choose ‘Dashboard Administration’ from the underside of the menu.
- Select ‘Index Patterns’
- Select “Create Index Sample”

- Enter the Index sample identify and select “Subsequent step”.
The index sample identify must be the index identify you entered in the course of the deployment adopted by an asterisk. See the next screenshot for reference.
- Choose ‘timestamp’ in main Time discipline and select ‘Create index sample’

- Select the star icon to make the index sample default

Configure Lambda with required entry for brand new index
On this part you’ll create a job in OpenSearch Service dashboards and can map Lambda execution position to the identical to carry out operations on the brand new index.
- Navigate to the Lambda console
- Seek for the operate starting along with your OpenSearch Service area identify.
- Within the operate particulars, go to Configuration > Permissions
- Select the Position Title within the Execution Position part.
- Copy the Lambda execution position ARN from this operate which handles Spot interruption occasions.
- Entry the Hamburger Menu icon on the highest left and choose ‘Safety’ from the underside of the menu.
- Now choose the ‘Roles’ menu possibility underneath ‘Safety’ menu after which choose ‘Create Position’
- Enter a job identify and set Cluster Permissions to “cluster_composite_ops_ro“.
- For Index Permissions, choose the index sample identify created throughout deployment.
See the next screenshot for reference.

- Set the Tenant Permissions to “global_tenant” as seen within the picture and Select “Create”.

- After the position is created, on the identical display, choose the ‘Mapped Customers’ tab and select ‘Handle Mapping’

- Select ‘Handle Mapping’
- Within the ‘Backend roles’ add the Lambda execution position ARN copied earlier and Select ‘Map’

You’ll be able to create extra customers within the inside database and grant applicable entry to the visualisations and dashboards. The next steps present the best way to create a learn solely position and to create an inside consumer and grant learn solely entry.
Handle customers and roles
On this part you’ll create a brand new consumer and a job with read-only entry, then assign the position to the consumer to grant them read-only entry to the Spot Interruption dashboard and visualizations.
- Entry the Hamburger Menu icon on the highest left
- Choose ‘Safety’ from the underside of the menu
- Choose ‘Inside Customers’ after which choose ‘Create Inside consumer’

- Enter username and set a Password, then select “Create”.

- Now choose the ‘Roles’ menu possibility underneath ‘Safety’ menu after which choose ‘Create Position’
- Enter the position identify and set Cluster Permissions to “cluster_composite_ops_ro“.
- For Index Permissions, choose the index sample identify created throughout deployment.
See the next screenshot for reference.

- Set the Tenant Permissions to “global_tenant” as seen within the picture and Select “Create”.

- After the position is created, on the identical display, choose the ‘Mapped Customers’ tab and select ‘Handle Mapping’

- Choose the consumer created above in ‘Customers’ and select ‘Map’

Configure and deploy pattern visualisations and dashboard
Pattern visualizations and a starter dashboard are supplied underneath the information folder of the git repo you cloned earlier. Search for the file named spot-interruption-dashboard-visualisations.ndjson.To import the visualizations:
- Navigate to Saved Objects underneath Dashboard Administration in OpenSearch Dashboards.
- Import the
spot-interruption-dashboard-visualisations.ndjsonfile. - Through the import, chances are you’ll encounter index sample conflicts. Choose the index sample you created from the dropdown and select “Verify all modifications”.

As soon as imported, the pattern visualizations and dashboard linked to your index sample might be out there underneath Dashboards within the left-side hamburger menu. You’ll be able to view the Spot Interruption Dashboard, which incorporates visualizations primarily based on Availability Zones, Areas, Occasion Varieties, Auto Scaling Teams (ASGs), and Interruptions over time. You’ll be able to additional customise by creating your personal visualizations utilizing the attributes out there within the index or by modifying/creating new dashboards. The dashboard will show empty views till Spot interruption knowledge is out there to visualise.
Take a look at the answer
A brief occasion rule was created throughout deployment to simulate matching Amazon EC2 Spot interruption notices. The rule identify is the identify you specified throughout deployment for the CustomEventRuleName parameter.
To confirm the answer, you may ship pattern occasions from the EventBridge console as depicted under. Within the AWS console,
After the occasion is distributed efficiently, you may log in to OpenSearch Dashboards and examine the Spot Interruption Dashboard, which has been prebuilt with the listed occasion knowledge. This dashboard gives insights throughout key dimensions resembling Availability Zones, Areas, occasion sorts, Auto Scaling teams, and interruption developments over time. Use the dashboard as a place to begin to grasp the sorts of insights potential and customise or create new visualizations primarily based in your wants and the fields out there within the index.
Alternatively, you may navigate to the Uncover part within the menu to view the uncooked occasion particulars. Be sure that you choose the index sample you created earlier on this demonstration, and regulate the time vary if needed (such because the final quarter-hour) to view the newest knowledge.
Safety and value optimizations
This answer is designed to be safe and cost-efficient by default, however there are some extra optimizations you may apply to additional scale back price and improve safety:
Safety finest practices
- Amazon Cognito Authentication : Combine Amazon Cognito with OpenSearch Dashboards to handle consumer authentication, allow Multi Issue Authentication, and keep away from hardcoding admin credentials. Extra info Configuring Amazon Cognito authentication for OpenSearch Dashboards
- Lambda Layer Versioning: Guarantee pinned variations of Lambda Layers are used to keep away from surprising modifications. Extra info Managing Lambda dependencies with layers
- Logging and Menace Detection: Allow AWS CloudTrail and Amazon GuardDuty to watch for unauthorized exercise or anomalies. Extra info Monitoring Amazon OpenSearch Service API calls with AWS CloudTrail
Price optimizations
- Bulk Indexing with Throttling Controls: Lambda processes batches and respects throttling limits to keep away from extreme OpenSearch utilization.
- Quick Retention for CloudWatch Logs: Tune log retention durations to keep away from pointless storage prices.
- Optimize Visualizations: Design saved visualizations to keep away from costly queries (like broad time ranges and huge aggregations). Extra info Optimizing question efficiency for Amazon OpenSearch Service knowledge sources
- Index State Administration (ISM) : Configure ISM insurance policies in OpenSearch to delete or archive older interruption knowledge. Extra info Index State Administration in Amazon OpenSearch Service
Cleanup
Run the next command to delete the assets deployed earlier.
After deleting the stack, ensure to additionally take away any post-deployment configurations you might have created throughout the OpenSearch Service dashboards console. Whereas these configurations gained’t incur extra prices, it’s thought of a finest apply to wash up your atmosphere by deleting any assets which might be now not wanted. Take a while to assessment the OpenSearch Service dashboards and determine any customized settings, dashboards, or visualizations you arrange in the course of the deployment course of. Then, delete these particular person configurations to make sure your atmosphere is absolutely cleaned up.
Conclusion
On this put up, you discovered the best way to construct and deploy a complete Spot Occasion interruption monitoring answer for Auto Scaling teams by utilizing EventBridge, Amazon SQS, Lambda, and OpenSearch Service. You applied an event-driven pipeline to seize and course of Amazon EC2 Spot Occasion interruption occasions, created safe analytics dashboards, and established real-time visibility into interruption patterns throughout your Auto Scaling group–managed workloads.
This put up’s answer empowers your groups with the visibility and agility wanted to function confidently with Amazon EC2 Spot Situations. By combining event-driven structure with safe, scalable analytics, now you can proactively monitor interruption occasions, determine interruption developments, and optimize workload methods for resilience and cost-efficiency.
With real-time knowledge at your fingertips, you’re geared up to make smarter infrastructure choices and maximize the advantages of Spot Occasion capability whereas minimizing disruption dangers.
In regards to the creator
