Improve Your Lakehouse: Your How-To Information for Changing to Unity Catalog Managed Tables


The brand new SET MANAGED command offers a seamless mechanism to convert UC exterior tables to UC managed tables whereas minimizing downtime, dealing with concurrent writes, sustaining desk configurations, and, the place attainable, preserving desk historical past. This text shares finest practices and offers a step-by-step information for utilizing this usually obtainable (GA) command:

Why Convert to UC Managed Tables?

With Unity Catalog because the supply of reality, managed tables unlock distinctive capabilities that improve efficiency, governance, and ease of use—with out vendor lock-in. 

Key benefits embrace:

  • Computerized optimizations that may enhance question efficiency by 20x and minimize storage prices by 50%+ (extra particulars right here).
  • Streamlined knowledge administration with computerized cleanup for dropped knowledge to save lots of on prices, in addition to undrop assist
  • Enhanced governance with knowledge lineage, fine-grained entry controls, and safer desk entry with Unity Catalog supervision over all reads and writes
  • basis for future capabilities reminiscent of computerized row deletion (Auto-TTL) and row-level ingestion Zerobus ingest, in Personal Preview). 

Transformed tables assist reads from any third-party consumer (see right here for extra particulars). 

How can the SET MANAGED Conversion Command Assist? 

The SET MANAGED command makes conversion from exterior to managed tables simpler

Characteristic

Advantage of SET MANAGED command

Decrease Downtime

Maintain the desk on-line and obtainable for reads utilizing Databricks Runtime 16.1 or above, and reduce downtime to only a few minutes for writes (or, for reads on Databricks Runtime 15.4 or under). 

Protect Id

The desk’s identify, permissions, tags, and settings for all tables, and desk historical past (for Delta tables) are all retained.

Deal with Concurrency

The SET MANAGED command safely handles concurrent writes which will happen through the conversion.

Roll Again

One other command referred to as UNSET MANAGED allows roll again of a transformed desk again to UC exterior inside 14 days, as a security internet.

How Do I Convert from Exterior to Managed Tables? 

A Practitioner’s Step-By-Step Information for Conversion

The SET MANAGED command makes desk conversion simple. In a step-by-step information, we have outlined key suggestions to make sure a clean transition from exterior to managed tables.  

Step 1: Choose Exterior Tables to Convert

Start by choosing a few Unity Catalog exterior tables to transform to UC managed first, to familiarize your staff with the method, stipulations, and post-conversion steps.

For instance, you possibly can check out this command first on a few tables which might be completely learn and written to by Databricks shoppers (see planning a staged journey). 

Step 2: Pre-Flight Guidelines

Examine that your ecosystem of desk readers and writers are prepared for change. For every chosen UC exterior desk and its related workloads, you’ll wish to:

  1. Replace to make use of Title-Primarily based Entry: Examine your jobs, notebooks, and queries to make sure they entry the desk utilizing its three-part identify (catalog.schema.desk) relatively than utilizing path-based entry (e.g., SELECT * FROM delta.’s3://path/to/desk’). Databricks Labs has developed UCX tooling that may make it easier to discover path-based references by working the next Databricks Labs UCX lint-local-code from an IDE terminal, to research your native machine’s listing code (.py or .sql information).
  2. Cancel all Upkeep Jobs: To stop conflicts, guarantee no OPTIMIZE, ZORDER, or CLUSTER BY jobs are working or scheduled to run on the desk through the conversion course of, in the event that they exist (can verify utilizing DESCRIBE HISTORY). After the conversion, Predictive Optimization will mechanically deal with optimization jobs.
  3. [Optional] Improve Databricks Runtime Variations: All Databricks clusters studying from or writing to the desk ought to ideally be on Databricks Runtime 15.4 LTS or greater to retain full desk historical past for Delta tables. Databricks Runtime 16.1 or greater can remove reader downtime fully. 

Step 3: Run the Conversion Command

Execute the conversion utilizing the next conversion command:

 Observe: For tables with UniForm enabled, use SET MANAGED TRUNCATE UNIFORM HISTORY.

Step 4: Confirm the Consequence

After the command completes, verify that the conversion was profitable by checking the desk’s metadata.

Within the output of this command, the “Sort” property ought to now show as “MANAGED”. You may also see this similar info within the ‘About this desk’ part of the Catalog Explorer.

Step 5: Submit-Conversion Housekeeping

After a profitable conversion, full these last steps to make sure a clean transition:

  • Restart streaming learn or write jobs that use the desk if any have paused
  • Carry out practical testing by working key queries to make sure all readers and writers are working as anticipated on the newly managed desk
  • Verify that Predictive Optimization is now enabled for the desk to start benefiting from automated upkeep (you can even allow CLUSTER by AUTO, for computerized liquid clustering, or verify if it’s been enabled).

Planning a Staged Journey

A profitable conversion of all tables to UC managed is a journey – adopting a phased method and planning forward may help guarantee a clean transition:

  1. Convert Databricks-Solely Tables: Prioritize changing tables which might be completely learn from and written to by Databricks shoppers. An experimental instrument, Entry Insights, can be utilized to assist determine tables with solely “Databricks readers and writers” vs. “Non-databricks readers” or “Non-databricks writers”.
  2. Convert Tables with Supported Exterior Instruments: Decide which tables are accessed by third-party instruments which additionally natively assist reads from UC managed tables, and convert these subsequent. Third-party entry will proceed working after conversion.
  3. Tackle Advanced Circumstances Final: For tables accessed with unsupported legacy instruments—plan to make use of options like Compatibility Mode for reads. The place third-party writes are required, re-create these tables and allow writes to those UC managed tables in Preview Preview. 

Further Concerns

The next particulars concerning the conversion command could also be helpful to know prematurely:

  • Rollback Time Restrict: To make use of roll again security internet, UNSET MANAGED have to be run on the UC managed desk inside 14 days of conversion – after that, the unique exterior knowledge will likely be completely deleted to save lots of on storage prices.
  • Time Journey Nuances: Upgrading shoppers to fifteen.4 LTS or greater will be useful. For clusters working on Databricks Runtime 14.3 LTS or under or in case you use the UNSET MANAGED command to roll again, you possibly can solely time journey to historic commits by model quantity after conversion, not by timestamp.
  • Minimized Downtime for Writers: The command is designed to reduce downtime – writers could expertise a quick outage (estimated between 1 and 5 minutes) through the last part when the desk’s location is switched to the brand new managed location.
  • Short-term Delta Sharing Interruption: Delta Sharing will likely be briefly interrupted throughout conversion, however this can operate correctly once more as soon as the method is full.  

Professional-Tip: Scaling Up with Bulk Conversion

To transform tons of or 1000’s of Unity Catalog exterior tables in bulk inside a given schema, you should use the next easy SQL script. 

Observe: This script performs stay modifications. It’s extremely really useful to check it totally in a improvement atmosphere earlier than working it in manufacturing.

 

Controlling Your Information’s Bodily Location

Unified Catalog (UC) managed tables reside in customer-managed storage and are accessible by open catalog APIs. If you need extra management over how your knowledge is bodily saved, you possibly can outline a managed storage location on the catalog or schema stage –  any new managed tables created in that catalog or schema will likely be mechanically organized in that specified location.

For pre-existing exterior tables, you possibly can set a managed storage location, then use the SET MANAGED command to transform them to UC managed tables. Throughout conversion, the system respects the managed location you’ve outlined, supplying you with management over the bodily structure of your knowledge in cloud storage. Please contact your account staff to entry this characteristic in Personal Preview as we speak. 

Changing from Exterior to Managed Tables As we speak

In only a few quick months since Public Preview, tons of of consumers have efficiently transformed 1000’s of tables with SET MANAGED.

Every little thing described right here is now GA—strive it out as we speak and unlock the efficiency, governance, and ease of Unity Catalog Managed Tables.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles