Greatest practices for upgrading from Amazon Redshift DC2 to RA3 and Amazon Redshift Serverless


Amazon Redshift is a quick, petabyte-scale cloud information warehouse that makes it easy and cost-effective to investigate your information utilizing customary SQL and your current enterprise intelligence (BI) instruments. Tens of 1000’s of shoppers depend on Amazon Redshift to investigate exabytes of knowledge and run advanced analytical queries, delivering the very best price-performance.

With a completely managed, AI-powered, massively parallel processing (MPP) structure, Amazon Redshift drives enterprise decision-making rapidly and cost-effectively. Beforehand, Amazon Redshift provided DC2 (Dense Compute) node varieties optimized for compute-intensive workloads. Nevertheless, they lacked the pliability to scale compute and storage independently and didn’t help lots of the trendy options now out there. As analytical calls for develop, many purchasers are upgrading from DC2 to RA3 or Amazon Redshift Serverless, which provide unbiased compute and storage scaling, together with superior capabilities corresponding to information sharing, zero-ETL integration, and built-in synthetic intelligence and machine studying (AI/ML) help with Amazon Redshift ML.

This publish supplies a sensible information to plan your goal structure and migration technique, overlaying improve choices, key concerns, and finest practices to facilitate a profitable and seamless transition.

Improve course of from DC2 nodes to RA3 and Redshift Serverless

Step one in direction of improve is to know how the brand new structure ought to be sized; for this, AWS supplies a suggestion desk for provisioned clusters. When figuring out the configuration for Redshift Serverless endpoints, you may assess compute capability particulars by analyzing the connection between RPUs and reminiscence. Every RPU allocates 16 GiB of RAM. To estimate the bottom RPU requirement, divide your DC2 nodes cluster’s complete RAM by 16. These suggestions present steerage in sizing the preliminary goal structure however rely upon the computing necessities of your workload. To raised estimate your necessities, take into account conducting a proof of idea that makes use of Redshift Check Drive to run potential configurations. To be taught extra, see Discover the very best Amazon Redshift configuration to your workload utilizing Redshift Check Drive and Efficiently conduct a proof of idea in Amazon Redshift. After you determine on the goal configuration and structure, you may construct the technique for upgrading.

Structure patterns

Step one is to outline the goal structure to your resolution. You possibly can select the principle structure sample that finest aligns together with your use case from the choices offered in Structure patterns to optimize Amazon Redshift efficiency at scale. There are two important situations, as illustrated within the following diagram.

On the time of writing, Redshift Serverless doesn’t have handbook workload administration; every little thing runs with automated workload administration. Take into account isolating your workload into a number of endpoints primarily based on use case to allow unbiased scaling and higher efficiency. For extra data, check with Structure patterns to optimize Amazon Redshift efficiency at scale.

Improve methods

You possibly can select from two potential improve choices when upgrading from DC2 nodes to RA3 nodes or Redshift Serverless:

  • Full re-architecture – Step one is to judge and assess the workloads to find out whether or not you may gain advantage from a contemporary information structure, then re-architect the present platform in the course of the improve course of from DC2 nodes.
  • Phased method– It is a two-stage technique. The primary stage includes a simple migration to the goal RA3 or Serverless configuration. Within the second stage, you may modernize the goal structure by making the most of cutting-edge Redshift options.

We normally advocate a phased method, which permits for a smoother transition whereas enabling future optimization. The primary stage of a phased method consists of the next steps:

  • Consider an equal RA3 nodes or Redshift Serverless configuration to your current DC2 cluster, utilizing the sizing pointers for provisioned clusters or the compute capability choices for serverless endpoints.
  • Totally validate the chosen goal configuration in a non-production surroundings utilizing Redshift Check Drive. This automated device simplifies the method of simulating your manufacturing workloads on numerous potential goal configurations, enabling a complete what-if evaluation. This step is strongly really useful.
  • Proceed to the improve course of when you find yourself happy with the price-performance ratio of a selected goal configuration, utilizing one of many strategies detailed within the following part.

Redshift RA3 situations and Redshift Serverless present entry to highly effective new capabilities, together with zero-ETL, Amazon Redshift Streaming Ingestion, information sharing writes, and unbiased compute and storage scaling. To maximise these advantages, we advocate conducting a complete evaluate of your present structure (the second stage of a phased method) to establish alternatives for modernization utilizing Amazon Redshift’s newest options. For instance:

Improve choices

You possibly can select from 3 ways to resize or improve a Redshift cluster from DC2 to RA3 or Redshift Serverless: snapshot restore, basic resize, and elastic resize.

Snapshot restore

The snapshot restore technique follows a sequential course of that begins with capturing a snapshot of your current (supply) cluster. This snapshot is then used to create a brand new goal cluster together with your desired specs. After creation, it’s important to confirm information integrity by confirming that information has been accurately transferred to the goal cluster. An essential consideration is that any information written to the supply cluster after the preliminary snapshot should be manually transferred to take care of synchronization.

This technique presents the next benefits:

  • Permits for the validation of the brand new RA3 or Serverless setup with out affecting the present DC2 cluster
  • Offers the pliability to revive to totally different AWS Areas or Availability Zones
  • Minimizes cluster downtime for write operations in the course of the transition

Take into accout the next concerns:

  • Setup and information restore would possibly take longer than elastic resize.
  • You would possibly encounter information synchronization challenges. Any new information written to the supply cluster after snapshot creation requires handbook copying to the goal. This course of would possibly want a number of iterations to attain full synchronization and require downtime earlier than cutoff.
  • A brand new Redshift endpoint is generated, necessitating connection updates. Take into account renaming each clusters as a way to preserve the unique endpoint (be sure that the brand new goal cluster adopts the unique supply cluster’s identify)

Basic resize

Amazon Redshift creates a goal cluster and migrates your information and metadata to it from the supply cluster utilizing a backup and restore operation. All of your information, together with database schemas and person configurations, is precisely transferred to the brand new cluster. The supply cluster restarts initially and is unavailable for a couple of minutes, inflicting minimal downtime. It rapidly resumes, permitting each learn and write operations because the resize continues within the background.

Basic resize is a two-stage course of:

  • Stage 1 (important path) – Throughout this stage, metadata migration happens between the supply and goal configurations, quickly inserting the supply cluster in read-only mode. This preliminary part is often transient. When this part is full, the cluster is made out there for learn and write queries. Though tables initially configured with KEY distribution fashion are quickly saved utilizing EVEN distribution, they are going to be redistributed to their authentic KEY distribution throughout Stage 2 of the method.
  • Stage 2 (background operations) – This stage focuses on restoring information to its authentic distribution patterns. This operation runs within the background with low precedence with out interfering with the first migration course of. The length of this stage varies primarily based on a number of components, together with the quantity of knowledge being redistributed, ongoing cluster workload, and the goal configuration getting used.

The general resize length is primarily decided by the info quantity being processed. You possibly can monitor progress on the Amazon Redshift console or through the use of the SYS_RESTORE_STATE system view, which shows the share accomplished for the desk being transformed (accessing this view requires superuser privileges).

The basic resize method presents the next benefits:

  • All potential goal node configurations are supported
  • A complete reconfiguration of the supply cluster rebalances the info slices to default per node, resulting in even information distribution throughout the nodes

Nevertheless, have in mind the next:

  • Stage 2 redistributes the info for optimum efficiency. Nevertheless, Stage 2 runs at a decrease precedence, and in busy clusters, it could possibly take a very long time to finish. To hurry up the method, you may manually run the ALTER TABLE DISTSTYLE command in your tables having KEY DISTSTYLE. By executing this command, you may prioritize the info redistribution to occur sooner, mitigating any potential efficiency degradation as a result of ongoing Stage 2 course of.
  • Because of the Stage 2 background redistribution course of, queries can take longer to finish in the course of the resize operation. Take into account enabling concurrency scaling as a mitigation technique.
  • Drop pointless and unused tables earlier than initiating a resize to hurry up information distribution.
  • The snapshot used for the resize operation turns into devoted to this operation solely. Due to this fact, it could possibly’t be used for a desk restore or different function.
  • The cluster should function inside a digital personal cloud (VPC).
  • This method requires a brand new or a latest handbook snapshot taken earlier than initiating a basic resize.
  • We advocate scheduling the operation throughout off-peak hours or upkeep home windows for minimal enterprise impression.

Elastic resize

When utilizing elastic resize to vary the node sort, Amazon Redshift follows a sequential course of. It begins by making a snapshot of your current cluster, then provisions a brand new goal cluster utilizing the latest information from that snapshot. Whereas information transfers to the brand new cluster within the background, the system stays in read-only mode. Because the resize operation approaches completion, Amazon Redshift mechanically redirects the endpoint to the brand new cluster and stops all connections to the unique one. If any points come up throughout this course of, the system sometimes performs an automated rollback with out requiring handbook intervention, although such failures are uncommon.

Elastic resize presents a number of benefits:

  • It’s a fast course of that takes 10–quarter-hour on common
  • Customers preserve learn entry to their information in the course of the course of, experiencing solely minimal interruption
  • The cluster endpoint stays unchanged all through and after the operation

When contemplating this method, have in mind the next:

  • Elastic resize operations can solely be carried out on clusters utilizing the EC2-VPC platform. Due to this fact, it’s not out there for Redshift Serverless.
  • The goal node configuration should present ample storage capability for current information.
  • Not all goal cluster configurations help elastic resize. In such circumstances, think about using basic resize or snapshot restore.
  • After the method is began, elastic resize can’t be stopped.
  • Information slices stay unchanged; this could doubtlessly trigger some information or CPU skew.

Improve suggestions

The next flowchart visually guides the decision-making course of for selecting the suitable Amazon Redshift improve technique.

When upgrading Amazon Redshift, the strategy depends upon the goal configuration and operational constraints. For Redshift Serverless, all the time use the snapshot restore technique. If upgrading to an RA3 provisioned cluster, you may select from two choices: use snapshot restore if a full upkeep window with downtime is appropriate, or select basic resize for minimal downtime, as a result of it rebalances the info slices to default per node, resulting in even information distribution throughout the nodes. Though you should utilize elastic resize for sure node sort adjustments (for instance, DC2 to RA3) inside particular ranges, it’s not really useful as a result of elastic resize doesn’t change the variety of slices, doubtlessly resulting in information or CPU skew, which might later impression the efficiency of the Redshift cluster. Nevertheless, elastic resize stays the first suggestion when it’s good to add or cut back nodes in an current cluster.

Greatest practices for migration

When planning your migration, take into account the next finest practices:

  • Conduct a pre-migration evaluation utilizing Amazon Redshift Advisor or Amazon CloudWatch.
  • Select the precise goal structure primarily based in your use circumstances and workloads. You should utilize Redshift Check Drive to find out the precise goal structure.
  • Backup utilizing handbook snapshots, and allow automated rollback.
  • Talk timelines, downtime, and adjustments to stakeholders.
  • Replace runbooks with new structure particulars and endpoints.
  • Validate workloads utilizing benchmarks and information checksum.
  • Use upkeep home windows for ultimate syncs and cutovers.

By following these practices, you may obtain a managed, low-risk migration that balances efficiency, price, and operational continuity.

Conclusion

Migrating from Redshift DC2 nodes to RA3 nodes or Redshift Serverless requires a structured method to help efficiency, cost-efficiency, and minimal disruption. By deciding on the precise structure to your workload, and validating information and workloads post-migration, organizations can seamlessly modernize their information platforms. This improve facilitates long-term success, serving to groups absolutely harness RA3’s scalable storage or Redshift Serverless auto scaling capabilities whereas optimizing prices and efficiency.


Concerning the authors

Ziad Wali

Ziad Wali

Ziad is an Analytics Specialist Options Architect at AWS. He has over 10 years of expertise in databases and information warehousing, the place he enjoys constructing dependable, scalable, and environment friendly options. Outdoors of labor, he enjoys sports activities and spending time in nature.

Omama Khurshid

Omama Khurshid

Omama is an Analytics Options Architect at Amazon Internet Companies. She focuses on serving to clients throughout numerous industries construct dependable, scalable, and environment friendly options. Outdoors of labor, she enjoys spending time along with her household, watching motion pictures, listening to music, and studying new applied sciences.

Srikant Das

Srikant Das

Srikant is an Analytics Specialist Options Architect at Amazon Internet Companies, designing scalable, sturdy cloud options in Analytics & AI. Past his technical experience, he shares journey adventures and information insights by way of partaking blogs, mixing analytical rigor with storytelling on social media.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles