This publish is cowritten with Sean Zou, Terry Quan and Audrey Yuan from MuleSoft.
In our earlier thought management weblog publish Why a Cloud Working Mannequin we outlined a COE Framework and confirmed why MuleSoft applied it and the advantages they acquired from it. On this publish, we’ll dive into the technical implementation describing how MuleSoft used Amazon EventBridge, Amazon Redshift, Amazon Redshift Spectrum, Amazon S3, & AWS Glue to implement it.
Answer overview
MuleSoft’s resolution was to construct a lakehouse constructed on prime of AWS providers, illustrated within the following diagram, supporting a portal. To offer close to real-time analytics we used an event-driven technique that which might set off AWS Glue jobs an refresh materialized views. We additionally applied a layered strategy that included assortment, preparation, and enrichment making it easy to determine areas that have an effect on knowledge accuracy.

For MuleSoft’s lakehouse end-to-end resolution, the next phases are key:
- Preparation section
- Enrichment section
- Motion section
Within the following sections, we focus on these phases in additional element.
Preparation section
Utilizing the COE Framework, we engaged with the stakeholders within the preparation section to find out the enterprise objectives and determine the information sources to ingest. Examples of knowledge sources have been cloud property stock, AWS Price and Utilization Experiences, and AWS Trusted Advisor knowledge. The ingested knowledge is processed within the lakehouse to implement the Nicely-Architected pillars, utilization, safety, and compliance standing checks and measures.
The way you configure the CUR knowledge and the Trusted Advisor knowledge to land into S3?
The configuration course of includes a number of elements for each CUR and Trusted Advisor knowledge storage. For CUR setup, prospects must configure an S3 bucket the place the CUR report shall be delivered, both by choosing an current bucket or creating a brand new one. The S3 bucket requires a coverage to be utilized and prospects should specify an S3 path prefix which creates a subfolder for CUR file supply .
Trusted Advisor knowledge is configured to make use of Kinesis Firehose to ship buyer abstract knowledge to the Help Knowledge lake S3 bucket .The info ingestion course of makes use of firehose buffer parameters (1MB buffer measurement and 60-second buffer time) to handle knowledge circulate to the S3 bucket .
The Trusted Advisor knowledge is saved in JSON and GZIP format, following a particular folder construction with hourly partitions utilizing the “YYYY-MM-DD-HH” format .
The S3 partition construction for Trusted Advisor buyer abstract knowledge contains separate paths for fulfillment and error knowledge, and the information is encrypted utilizing a KMS key particular to Trusted Advisor knowledge .
MuleSoft used AWS managed providers and knowledge ingestion instruments to tug from a number of knowledge sources and that may assist customizations. Cloudquery is used device to collect cloud infrastructure info, which may join many infrastructure knowledge sources out of the field and land it into an Amazon S3 bucket. The MuleSoft Anypoint Platform gives an integration layer to combine infrastructure instruments, accommodating many knowledge sources like on-premise, SaaS, and industrial off-the-shelf (COTS) software program. Cloud Custodian was used for its functionality of managing cloud sources and auto-remediation with customizations.
Enrichment section
The enrichment section contains ingesting uncooked knowledge aligning with our enterprise objectives into the lakehouse by way of our pipelines, and consolidating the information to create a single pane of glass.
The pipelines undertake the event-driven structure consisting of EventBridge, Amazon Easy Queue Service (Amazon SQS), and Amazon S3 Occasion Notifications to offer close to real-time knowledge for evaluation. When new knowledge arrives within the supply bucket, new object creation is captured by the EventBridge rule, which invokes the AWS Glue workflow, consisting of an AWS Glue crawler and AWS Glue extract, rework, and cargo (ETL) jobs. We additionally configured S3 Occasion Notifications to ship messages to the SQS queue to verify the pipeline will solely course of the brand new knowledge.
The AWS Glue ETL job cleanses and standardizes the information, in order that it’s able to be analyzed utilizing Amazon Redshift. To sort out knowledge with complicated constructions, further processing is carried out to flatten the nested knowledge codecs right into a relational mannequin. The flattening step additionally extracts the tags of AWS property out of the nested JSON objects and pivots them into particular person columns, enabling tagging enforcement controls and possession attribution. The possession attribution of the infrastructure knowledge gives accountability and holds groups accountable for the prices, utilization, safety, compliance, and remediation of their cloud property. One necessary tag is asset possession which is from the tags extracted from the flattening step, this knowledge might be attributed to the corresponding homeowners by SQL scripts.
When the workflow is full, the uncooked knowledge from totally different sources and with numerous constructions is now centralized within the knowledge warehouse. From there, disjointed knowledge with totally different functions is able to be consolidated and translated into actionable intelligence within the Nicely-Architected Pillars by coding out the enterprise logic.
Options for the enrichment section
Within the enrichment section, we confronted quite a few storage, effectivity, and scalability challenges given the sheer quantity of knowledge. We used three strategies (file partitioning, Redshift Spectrum, and materialized views) to deal with these points and scale with out compromising efficiency.
File partitioning
MuleSoft’s infrastructure knowledge is saved in folder construction: yr, month, day, hour, account, and Area in an S3 bucket, so AWS Glue crawlers are capable of routinely determine and add partitions to the tables within the AWS Glue Knowledge Catalog. Partitioning helps enhance question efficiency considerably as a result of it optimizes parallel processing for queries. The quantity of knowledge scanned by every question is restricted based mostly on the partition keys, serving to cut back general knowledge transfers, processing time, and computation prices. Though partitioning is an optimization method that helps enhance question effectivity, it’s necessary to bear in mind two key factors whereas utilizing this system:
- The Knowledge Catalog has a most cap of 10 million partitions per desk
- Question efficiency will get compromised as partitions develop quickly
Due to this fact, balancing the variety of partitions within the Knowledge Catalog tables and question effectivity is important. We selected an information retention coverage of three months and configured a lifecycle rule to run out any knowledge older than that.
Our event-driven structure–AWS Eventbridge occasion is invoked when objects are put into or faraway from an S3 bucket, occasion messages are revealed to the SQS queue utilizing S3 Occasion Notifications, which invokes an AWS Glue crawler to both add new partitions or removes previous partitions from the Knowledge Catalog based mostly on the messages dealing with the partition cleanup.
Amazon Redshift and concurrency scaling
MuleSoft makes use of Amazon Redshift to question the information in S3 as a result of it gives giant scale compute and minimized knowledge redundancy. MuleSoft additionally used Amazon Redshift concurrency scaling to run concurrent queries with persistently quick question efficiency. Amazon Redshift routinely added question processing energy in seconds to course of a excessive variety of concurrent queries with none delays.
Materialized views
One other method we used is Amazon Redshift materialized views. Materialized views retailer preset question outcomes that future related queries can use, so many computation steps might be skipped. Due to this fact, related knowledge might be accessed effectively, which ends up in question optimization. Moreover, materialized views might be routinely and incrementally refreshed. Due to this fact, we will obtain a single pane of glass in our cloud infrastructure with probably the most up-to-date projections, tendencies, and actionable insights to our group with improved question efficiency.
Amazon Redshift Materialized Views (MVs) are used extensively for reporting in MuleSoft’s Cloud Central portal, but when customers wanted to drill down right into a granular view they might reference exterior tables.
Mulesoft is at the moment manually refreshing the materialized views by way of the event-driven structure, however is evaluating a change to computerized refresh.
Motion section
Utilizing materialized views in Amazon Redshift, we developed a self-serve Cloud Central portal in Tableau to offer a show portal for every workforce, engineer, and supervisor providing steering and proposals to assist them function in a method that aligns with the group’s necessities, requirements, and funds. Managers are empowered with monitoring and decision-making info for his or her groups. Engineers are capable of determine and tag property with lacking obligatory tagging info, in addition to remediate non-compliant sources. A key characteristic of the portal is personalization, that means that the portal is enabled to populate visualizations and evaluation based mostly on the related knowledge related to a supervisor’s or engineer’s login info.
Cloud Central additionally helps engineering groups enhance their cloud maturity within the six Nicely-Structure pillars: operational excellence, safety, reliability, efficiency effectivity, price optimization, and sustainability. The workforce proved out the “artwork of potential” by poc’ing Amazon Q to help with 100 and 200 Nicely-Architected pillar inquiries and tips on how to’s. The next screenshot illustrates the MuleSoft implementation of the portal, Cloud Central. Different corporations will design portals which are extra bespoke to their very own use instances and necessities.

Conclusion
The technical and enterprise influence of MuleSoft’s COE Framework permits an optimization technique and a cloud utilization present again strategy which helps MuleSoft proceed to develop with a scalable and sustainable cloud infrastructure. The framework additionally drives continuous maturity and advantages in cloud infrastructure centered across the six Nicely-Structure pillars proven within the following determine.

The framework helps organizations with expanded public cloud infrastructure obtain their enterprise objectives guided by the Nicely-Architected advantages powered by an event-driven structure.
The event-driven Amazon Redshift lakehouse structure resolution provides close to real-time actionable insights on decision-making, management, and accountability. The event-driven architecutre might be distilled into modules which might be added or deleted relying in your technical/enterprise objectives.
The workforce is exploring new methods to decrease the full price of possession. They’re evaluating Amazon Redshift Serverless for transient database workloads in addition to exploring Amazon DataZone to combination and correlate knowledge sources into an information catalog to share amongst groups, purposes, and features of companies in a democratized method. We are able to enhance visibility, productiveness, and scalability with a well-thought-out lakehouse resolution.
We invite organizations and enterprises to take a holistic strategy to know their cloud sources, infrastructure, and purposes. You’ll be able to allow and educate your groups by way of a single pane of glass, whereas working on an information modernization lakehouse making use of Nicely-Architected ideas, greatest practices, and cloud-centric rules. This resolution can in the end allow close to real-time streaming, leveling up a COE Framework properly into the longer term.
Concerning the Authors
Sean Zou is a Cloud Operations chief with MuleSoft at Salesforce. Sean has been concerned in lots of elements of MuleSoft’s Cloud Operations, and helped drive MuleSoft’s cloud infrastructure to scale greater than tenfold in 7 years. He constructed the Oversight Engineering perform at MuleSoft from scratch.
Terry Quan focuses on FinOps points. He works on MuleSoft Engineering on cloud computing budgets and forecasting, price discount efforts, costs-to-serve, and coordinates with Salesforce Finance. Terry is a FinOps Practitioner and Skilled Licensed.
Audrey Yuan is a Software program Engineer with MuleSoft at Salesforce. Audrey works on knowledge lakehouse options to assist drive cloud maturity throughout the six pillars of the Nicely-Architected Framework.
Rueben Jimenez is a Senior Options Architect at AWS, designing and implementing complicated knowledge analytics, AI/ML, and cloud infrastructure options.
Avijit Goswami is a Principal Options Architect at AWS specialised in knowledge and analytics. He helps AWS strategic prospects in constructing high-performing, safe, and scalable knowledge lake options on AWS utilizing AWS managed providers and open supply options. Outdoors of his work, Avijit likes to journey, hike, watch sports activities, and take heed to music.
