IBM to Purchase DataStax for Database, GenAI Capabilities


IBM at the moment introduced its intent to accumulate DataStax, the longtime backer of the Apache Cassandra database that has not too long ago broadened its attain into streaming knowledge and generative AI. IBM cited DataStax’s functionality to handle unstructured knowledge in addition to its vector database, which is used for creating RAG options.

Apache Cassandra was initially developed at Fb in 2008 to serve the fledgling social community’s want for a extremely scalable, fault-tolerant database to retailer massive knowledge generated by customers on its web site. Fb was an enormous person and creator within the nascent massive knowledge ecosystem, constructing its social media empire atop non-relational expertise like Apache Hadoop and HBase, one other NoSQL knowledge retailer, in addition to Apache Hive, which it created to make Hadoop appear to be a relational database. (Fb would finally transfer again to utilizing relational databases, particularly Postgres, however that’s one other story.)

Cassandra, which technically is a wide-column retailer that favors knowledge availability and reliability (on the expense of information consistency), turned a top-level venture on the Apache Software program Basis in 2010. That’s the identical yr that Jonathan Ellis and Matt Pfeil co-founded an organization in Austin, Texas known as Riptano, which it shortly renamed DataStax.

At first, DataStax adopted the everyday industrial open-source enterprise mannequin, providing an enterprise model of Apache Cassandra known as DataStax Enterprise (DSE). The corporate, which had moved to Santa Clara, California by 2014, attracted clients from the Fortune 500, similar to FedEx, Capital One, and Verizon. It has raised $106 million in enterprise capital at a $830 valuation, and was on tempo for an IPO within the 2015 or 2016 timeframe.

That IPO by no means occurred, as MongoDB dominated the NoSQL house and went public in 2017. In Might 2020, DataStax launched Astra DB, a totally managed model of Cassandra working within the cloud atop Kassandra, giving clients the scalability and availability advantages of the NoSQL database however with out the administration tasks (like many distributed techniques, Cassandra might be tough to handle). Later that yr, it launched K8ssandra, an open supply model of the database working atop the useful resource supervisor.

Quickly, the corporate began branching past NoSQL databases. In 2021, it launched Astra Streaming, an occasion streaming platform primarily based on Apache Pulsar, a publish and subscribe (pub-sub) knowledge platform that competes with Apache Kafka. In 2023, DataStax purchased Kaskada, an AI startup that helped to automate tedious function engineering duties, and made the software program open supply underneath the Luna ML model.

DataStax additional bolstered its generative AI capabilities in 2023 with the launch of a vector retailer in Astra DB. Vector shops emerged as essential instruments for constructing retrieval-augmented era (RAG) pipelines to bolster the accuracy of huge language mannequin (LLM) output in generative AI functions. Then in 2024, DataStax additional fleshed out its RAG story when it nabbed Langflow, which developed an open supply framework for constructing RAG pipelines.

(Laborant/Shutterstock)

The entire accrued capabilities that DataStax constructed and purchased clearly caught the attention of IBM. Massive Blue, which has been rallying its enterprise to a point on the again of its watsonx AI choices, cited open supply initiatives like Apache Cassandra, Apache Pulsar, Langflow, and OpenSearch (a department of Elasticsearch and Kibana) in its press launch saying the acquisition.

IBM is especially enamored of how DataStax has constructed its unstructured knowledge administration capabilities underneath a single product. Whereas it didn’t point out DataStax’s Hyper-Converged Information Platform (HCDP) by title, it appears clear that IBM is banking on harnessing the tech to assist clients flip unstructured knowledge into profitable AI functions.

“Unstructured knowledge represents a treasure trove of untapped enterprise intelligence, representing 93% of all enterprise knowledge in 2024, in line with IDC,” Ritika Gunnar, IBM’s normal supervisor of information and AI, says in a weblog submit. “Harnessing the facility of this knowledge inside generative AI functions is crucial. However to do this, enterprises should first make order out of information chaos.”

In keeping with Gunnar, IBM needs to deliver DataStax’s open supply choices along with its watsonx portfolio of merchandise, particularly Apache Iceberg, Apache Spark, Velox, and Presto, to assist clients leverage massive quantities of unstructured knowledge.

“The information infrastructure required for AI is way more than simply vector,’” Gunnar writes. “Many modalities of information–JSON, time-series, key/worth, tabular, graph–want to come back collectively to make the info ingest and search correct and related. By having them constructed right into a simplified and scalable answer (because of generative AI) customers don’t must sew collectively a large number of information representations to realize worth from their enterprise knowledge.)

In his personal weblog submit, DataStax CEO Chet Kapoor mentioned how DataStax and IBM have labored along with open supply software program (OSS) since 2020, together with deploying DataStax merchandise atop the IBM OpenShift platform.

“We respect the management and stewardship that IBM has demonstrated with OSS and the good OSS corporations which have discovered a house at IBM, like Purple Hat and others, and we’re excited to develop into a part of an organization that understands the facility of openness,” Kapoor writes. “With our applied sciences and IBM’s watsonx.knowledge, their hybrid, open knowledge lakehouse, we will deliver vector and AI search to all the knowledge property and make IBM’s capabilities accessible to each developer.”

Phrases of the deal, which is anticipated to shut within the second quarter, weren’t disclosed. DataStax was valued at $1.6 billion throughout its most up-to-date funding spherical, in June 2022. The corporate has raised $342.6 million over a number of rounds. It has a whole bunch of paying clients, in line with IBM.

Associated Gadgets:

DataStax Rolls Out Vector Seek for Astra DB to Help Gen AI

DataStax Publicizes New K8ssandra Operator

Cassandra Now Formally Within the Cloud with DataStax Astra

 

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles