The subsequent step can also be being safe. The problem is, if the LLM can run any doable question in opposition to the database, then how do you be sure you don’t exfiltrate and leak data? We’ve constructed this expertise that we name parameterized safe views within the database itself, that allows you to outline the suitable safe obstacles and encodes the safety insurance policies that you just want, in order that the LLM can generate any question it needs, however with respect to the logged-in person we is not going to allow them to see any data that they aren’t imagined to see. We’ll additionally, on an information-theoretical foundation, not leak data that they need to not have entry to.
Heller: I do know you’ve spent plenty of time fascinated about the way forward for databases and generative AI. The place are we headed?
Krishnamurthy: A part of my pondering right here has advanced over the past couple of years, however for 50 years the world of databases has been at the very least SQL databases the place it was all about producing actual outcomes. I prefer to say databases had one job: retailer the info, don’t lose the info, after which once you ask a query, give the precise end result. OK, perhaps two jobs. It was all about actual outcomes as a result of we’re coping with structured knowledge. I feel the most important change that’s taking place proper now’s that we’re now not simply coping with structured knowledge. We’re additionally coping with unstructured knowledge. Whenever you mix structured and unstructured knowledge, the following step is that it’s not nearly actual outcomes however about probably the most related outcomes. On this sense databases begin to have a number of the capabilities of search engines like google and yahoo, which is about relevance and rating, and what turns into necessary is sort of like precision versus recall for data retrieval techniques. However how do you make all of this occur? One key piece is vector indexing. In different phrases, you’ve gotten structured knowledge, which is within the database, however now we have different kinds of data, unstructured knowledge, semi-structured knowledge.
