Here’s a state of affairs that performs out consistently in enterprise software program groups. A product supervisor asks the corporate’s AI assistant: “Who’re our high clients this quarter?” The system returns a clear, ranked listing. It appears proper. Everybody strikes on.
Besides the product group defines “high” by engagement. Finance defines it by web income. Gross sales defines it by deal dimension. The AI picked one interpretation, offered it with full confidence, and no one seen till a technique resolution acquired made based mostly on numbers that meant one thing totally different to each particular person within the room.
This isn’t hallucination in the way in which folks normally discuss it. The system didn’t make something up. It simply made a alternative about which means that was by no means its option to make.
The Actual Drawback Isn’t the Mannequin
There’s a widespread assumption in enterprise AI adoption that should you choose the correct mannequin, tune it rigorously, and feed it good knowledge, you’ll get dependable outputs. That assumption misses the precise failure mode.
LLMs are terribly good at language. They aren’t good at organizational which means. Ask your AI what your churn charge is, and watch what occurs. The mannequin doesn’t know whether or not you measure churn on the subscription stage or the shopper stage. It doesn’t know whether or not you depend downgrades or ignore them. It doesn’t know if enterprise accounts with a number of seats are dealt with in a different way. These usually are not solutions buried in a doc someplace. They’re organizational choices that stay in tribal information, staff agreements, and knowledge mannequin feedback written two years in the past by somebody who has since left the corporate.
The mannequin will infer. And inference, offered with confidence, is a legal responsibility.
Embeddings Don’t Repair This
The usual response to this downside is best retrieval. Embed your documentation, pull probably the most related chunks, give the mannequin extra context. It’s an affordable instinct and a partial enchancment. Nevertheless it doesn’t resolve the underlying difficulty.
Embeddings measure how shut two items of textual content are in vector house; they are saying nothing about whether or not a given interpretation is definitely right to your group. “Income” and “revenue” are neighbors in embedding house as a result of they seem collectively consistently in monetary writing. In your monetary reporting system, conflating them is a critical error. No quantity of retrieval resolves that as a result of the proper reply isn’t in any doc. It’s in a choice your finance staff made about methods to outline issues, most likely years in the past, most likely by no means written down in a type a machine can use.
The identical structural downside exhibits up in all places. “Lively person” means one thing totally different to your engineering staff (an API name) than to your product staff (a accomplished transaction). “Conversion” means a profitable HTTP request to at least one staff and a signup-to-paid development to a different. “Engagement” is occasion frequency in a single dashboard and session depth in one other. Retrieval doesn’t resolve definitional ambiguity. It simply retrieves extra textual content that comprises the paradox.

Determine 1: And not using a semantic layer, LLM outputs are believable however inconsistent. With one, they’re grounded and proper.
What Really Must Occur
The reply is a semantic layer, a structured, machine-readable illustration of what your group’s phrases really imply. Not a glossary. Not higher documentation. A proper encoding of entities, relationships, metrics, and disambiguation guidelines that sits between your knowledge and your AI system, in order that when somebody asks about churn or energetic accounts or high clients, the system isn’t guessing.
This isn’t a brand new concept within the knowledge world. Instruments like dbt and Looker have utilized it to enterprise intelligence for years. What’s new is the strain to increase it into AI pipelines, and the tooling is catching up: the dbt Semantic Layer now helps direct AI pipeline integration, and platforms like Dice are constructing native LLM connections for precisely this function.
The sensible start line for many groups is a schema-based method: YAML or JSON configuration information, version-controlled in git, injected at inference time. Much less rigorous than formal ontologies, however dramatically extra maintainable, and normally enough. If you have already got a BI semantic layer, your definitional work is basically completed. The problem is making it queryable when the AI wants it.
The Tougher Drawback Is Organizational
Right here’s what most structure posts miss: the technical implementation is the straightforward half. Getting three departments to agree on what “energetic” means isn’t. Constructing and sustaining a semantic layer forces conversations that organizations routinely keep away from, and it surfaces disagreements which were quietly producing inconsistent outcomes for years. That’s uncomfortable. It’s additionally the purpose.
There’s a easy take a look at I take advantage of: if a brand new rent would wish to learn inner documentation to know what a key enterprise time period means, that time period belongs in a semantic layer, not in a immediate.
The following part of enterprise AI isn’t about which mannequin you utilize. It’s about how properly your group has systematized its personal information for machine consumption. From immediate engineering to context engineering. From knowledge pipelines to which means pipelines. The groups that get this proper will produce AI outputs that aren’t simply fluent; they’ll be right. In enterprise programs, being fluent isn’t sufficient. In case your AI isn’t definitionally right, it’s operationally unreliable.
As an alternative of asking: “Who’re our high clients?” — Outline it:
TopCustomer = revenue_last_90_days > $50K AND active_subscription = true
