How Deutsche Börse constructed a generative AI software to sort out the large-scale migration of Zeppelin notebooks to Databricks


At Deutsche Börse Group, our StatistiX platform supplies roughly 95% of all Clearing and Buying and selling information throughout the group, powering self-service analytics for lots of of enterprise customers. Holding that information accessible and actionable is central to every part we do.

For years, that meant Zeppelin notebooks operating on Cloudera, with entry to HDFS and Oracle information techniques. The platform served us properly, however the panorama shifted. Cloudera is absolutely decommissioning Zeppelin in 2027, our analytics workloads are transferring to the cloud, and Databricks has been chosen as our new unified analytics platform. That mixture created a migration problem that the majority organizations underestimate: 2,000+ customers and a excessive quantity of notebooks, lots of them deeply embedded in day-to-day enterprise workflows, all needing to maneuver.

Rewriting every part manually would take years. So we determined to construct a greater path on Databricks.

The pocket book migration drawback

Infrastructure migrations get lots of consideration. Pocket book migrations have a tendency to not, which is a giant cause why they gradual groups down.

Our Zeppelin notebooks weren’t easy scripts. They contained complicated SQL and Python logic, customized interpreters, Oracle and HDFS references, visualizations, widgets and scheduling logic constructed up over years. Every one mirrored institutional information from the enterprise groups who relied on it. The range throughout all the pocket book panorama made a rule-based rewriting engine impractical, for the reason that logic was just too heterogeneous and too business-specific for automated guidelines to deal with reliably.

That constraint led us to a cleaner design perception: separate construction from logic, and apply the proper software to every. Structural conversion (mapping Zeppelin’s paragraph format to Databricks cells, translating interpreter syntax, reformatting metadata) is deterministic and automatable, whereas logic reconstruction will not be. Fortunately, LLMs are nice at this structural conversion half..

Constructing the converter on Databricks Apps

With that design precept in hand, we constructed the Zeppelin to Databricks Pocket book Converter, a Databricks App designed particularly for our migration workflow.

The app handles the structural facet of the conversion: Zeppelin paragraphs change into Databricks cells, interpreter mappings are utilized (%python, %sql, %pyspark and others are translated to their Databricks equivalents), and pocket book metadata is reformatted into legitimate .ipynb JSON. Authentic content material is preserved precisely. We’re not rewriting logic at this stage, simply making ready it for the subsequent step.

That subsequent step is Genie. For each uploaded pocket book, the app robotically generates a context-aware immediate that features particular particulars about our Zeppelin setting. Suppose our customized interpreters, information sources and configuration patterns. The immediate offers Genie the context it must reconstruct logic precisely in a Databricks-native means.

The workflow for a enterprise consumer is simple:

  1. Export a Zeppelin pocket book as JSON
  2. Add it into the Databricks App
  3. Click on Convert
  4. Obtain the transformed .ipynb
  5. Open Databricks, add the pocket book, launch Genie and paste the generated immediate
  6. Genie asks clarifying questions and rebuilds the pocket book

The app itself was constructed with a shadcn UI frontend. Initially, we constructed a Streamlit prototype, however we felt that shadcn gave us a extra skilled and scalable interface. The Databricks Apps improvement expertise made it easy to ship rapidly with out standing up separate infrastructure.

What we selected to not automate

Probably the most necessary design choices was figuring out what the software ought to deliberately depart alone.

The converter doesn’t rewrite SQL logic, Python logic, visualizations, widgets, Oracle and HDFS references, scheduling logic or business-specific customized code. All of that content material is preserved within the transformed pocket book, untouched, as a result of rewriting it robotically would introduce errors and undermine belief within the output. These are precisely the weather that adjust most throughout notebooks and that carry probably the most business-critical logic. They belong to Genie, which might interpret context, ask clarifying questions and make judgment calls that guidelines can not.

This hybrid strategy of automating the deterministic half and delegating the variable half permits us to keep away from the brittleness of rule-based techniques and leverage AI the place it truly performs properly.

The consequence: hours to minutes

By combining structural conversion with AI-assisted logic reconstruction, we have diminished pocket book redevelopment from hours of guide effort to fifteen–20 minutes per pocket book, relying on complexity. For a large-scale migration of this nature, spanning a number of enterprise domains, this strategy transforms what would have been a resource-intensive, time-consuming enterprise right into a scalable, repeatable workflow that may take a lot much less time. 

The pace achieve additionally adjustments the character of the work. Enterprise customers do not want deep Databricks experience emigrate their very own notebooks. They comply with a brief sequence of steps, get a immediate, and let Genie do the reconstruction. The software is accessible sufficient that migration would not require a devoted engineering crew.

What we realized

A couple of ideas emerged from this challenge that we might carry into any related effort.

  • Keep away from overengineering. Our first try used a extra complicated agentic structure that added overhead with out fixing the core drawback. A easy UI and a clear backend turned out to be precisely ample.
  • Rule-based rewriting would not scale for heterogeneous content material. The range of logic throughout our notebooks made guidelines impractical. LLMs are important for dealing with that variability and the secret’s designing the handoff between automation and AI thoughtfully.
  • Context is the distinction between immediate and a terrific one. Generic Genie prompts produce generic outcomes. Investing in a immediate that encodes information of our particular setting–interpreters, information sources, configuration patterns–is what made the output truly usable.
  • Have interaction your platform crew early. Our collaboration with the Databricks crew all through the construct helped us keep aligned and keep away from rework.

What’s subsequent

Whereas the preliminary improvement of our converter software is full, we at the moment are continuing with large-scale, real-world testing. Our instant priorities embody finalising immediate definitions to enhance accuracy, validating the software with notebooks throughout a number of enterprise entities and IT, and making ready to onboard the customers.

The broader implication is what excites us most. This challenge demonstrated that AI-assisted migration is not a future functionality, it is out there now! By combining Databricks Apps with generative AI, we have constructed a repeatable workflow that turns one in every of cloud transformation’s hardest issues into a quick, scalable course of.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles