Beyond Hallucinations: 7 Steps to Getting Accurate, Consistent, and Relevant Responses from AI

PinIt

Once a foundation for more trustworthy, reliable AI is built using a universal semantic layer, errors will occur less often, and hallucinations may disappear from the AI lexicon.

Organizations are increasingly embracing large language models (LLMs) and generative AI (GenAI) because of the promise of intuitive systems for users of all technical abilities, with responses much faster than those from human colleagues. Agentic AI takes it one step further by operating independently, gathering and analyzing real-time data from applications, making decisions, and enabling faster responses to emerging opportunities or challenges.

But what happens when AI models produce incorrect or misleading results? The consequences can be disastrous. In critical applications like healthcare, transportation, and security, AI hallucinations can lead to inaccurate diagnoses, misidentification, or erroneous operational commands, endangering lives and property. Or, in a less disastrous but unnerving instance, Microsoft’s chat AI, Sydney, admitted to falling in love with users and spying on Bing employees.

When AI models make these mistakes, we typically say they are “hallucinating,” but the term is inaccurate. Calling AI errors hallucinations can be misleading and potentially harmful, as it implies a level of consciousness or mental state in the AI that doesn’t exist. It minimizes the seriousness of factual inaccuracies and can contribute to the spread of misinformation by downplaying the issue of AI generating false information. The most likely casualty from these performance issues is a lack of trust in the technology generally, and a loss of the potential benefits.

Some policymakers and technologists think the answer is to pay more attention to “aligning” AI with human values, but there is a better way forward: ensure that the data used for retrieval is accurate, performance-optimized, and reliable. Getting the data house in order before undertaking any AI initiative is the only way to implement AI securely, accurately, and compliantly. 

Here’s a seven-step approach to laying the data foundation for successful, reliable, and scalable AI projects using business data.

See also: In Defense of Keeping a Human in the AI Loop

1) Implementing a universal semantic layer

It’s vital to think of AI models as any data consumer, just like humans, business intelligence tools, or embedded analytics. The quality and relevance of the data used by an AI model are crucial for its effectiveness. The AI model will likely produce poor or incorrect outputs if the data available to it is incomplete,  inaccurate, or unclear.

Implementing a universal semantic layer is the first step in ensuring that data is AI-ready. The semantic layer defines domain-relevant objects, concepts, and their relationships. It acts as abstraction middleware between data storage and analytics tools like LLMs and BI tools, translating metadata into natural language for easier user and AI interaction.

The universal semantic layer is the foundation for trustworthy access to data using LLMs. The semantic layer provides context and constrains what the LLM can answer. LLMs have been trained on the language itself—nouns, adjectives, etc. These language elements are represented by objects that a semantic layer gives LLMs: measures for quantitative description, dimensions/attributes for adjectives, and entities for nouns.

Instead of using the entire body of information it has been trained on, the LLM can respond to queries using the objects and attributes provided in the semantic layer. When asking an LLM to generate raw SQL on a database schema, it’s far too easy for it to make errors. The semantic layer solves the issue by acting as an intermediary between LLMs and data warehouses.

2) Enriching data with business context

The semantic layer is the place where data engineers add metadata and domain-specific knowledge to data, making it more human-understandable and machine-interpretable. This improves data quality, analysis, and AI model performance.

By incorporating business-specific data and knowledge, AI models can adapt to an organization’s specific context, acronyms, terminology, and methods. Enriched metadata allows AI to uncover hidden patterns and relationships within your business data, leading to data-driven insights that can inform strategic decisions and improve business performance.

3) Generating responses aligned with business vocabulary and user intent

Aligning responses with business vocabulary and user intent is crucial in AI for effective communication, increased user satisfaction, and improved business outcomes. It ensures that AI systems understand and respond appropriately to user needs.

By understanding a business’s specific language and context, AI can generate more accurate and relevant responses to users’ needs. When AI uses appropriate terminology and understands user intent, it leads to a more natural and satisfying user experience. AI can also reduce ambiguity in communication by using the correct terminology and contextually relevant responses.

4) Ensuring outputs are traceable back to governed, auditable data

Data governance is also crucial for AI to be safe, secure, and responsible, ensuring effective and ethical use. By implementing key governance principles, organizations can mitigate risks, enhance data quality, and ensure compliance with regulations. 

A universal semantic layer provides a centralized location for businesses to enforce roles, procedures, and rules to specify who has access to, changes, and shares data in a well-governed data environment. It improves governance by centralizing data policies, security procedures, and access controls so AI is transparent and explainable, with public disclosures and traceable, auditable outputs.

5) Unifying and governing your data with AI-ready semantic models

Data unification involves bringing together structured and unstructured data from various sources (databases, data lakes, SaaS applications, IoT devices), normalizing it into a common format, and defining relationships between data entities. Next, it involves defining hierarchical structures and relationships in the data and adding meaning by linking it with external knowledge sources like industry standards and regulatory frameworks.

A universal semantic layer also provides robust data governance, ensuring that organizations can strike the correct balance between control and flexibility by centralizing data policies, security procedures, and access controls. 

6) Delivering consistent, reliable, and trustworthy autonomous actions

With the rise of agentic AI, or autonomous AI, artificial intelligence can run independently to design, execute, and optimize workflows. Realizing the promise of AI models to operate independently, make decisions, and take actions without constant human oversight depends on one critical factor: trust.

Trust is earned when AI systems prove their competence, consistency, and transparency through explainability and measurable outcomes. AI systems, especially agentic AI systems, can only be trusted with a solid, reliable data foundation that includes business context.

7) Scaling seamlessly under any workload with optimized data

To leverage business data with AI, context and constraints need to be provided and enforced on an AI system. This does require a minimum level of data pipeline quality to work at scale. AI systems need to be fed with the business context of what data means and is available, and given an interface to simply request this without needing to be creative in accessing it.

Poorly structured or inconsistent data also increases processing time and reduces accuracy. Optimized data pipelines that efficiently clean, preprocess, and structure data are crucial for seamless AI scalability. A universal semantic layer is pivotal to optimizing data pipelines and ensuring that AI can swiftly consume the data and produce accurate outputs.

See also: Reports of the AI-Assisted Death of Prose are Greatly Exaggerated

A final thought

The alarming truth is that 70 to 80 percent of AI projects fail, with data quality among the top reasons why. It’s time to stop the “garbage in/garbage out” phenomenon and build a foundation for trustworthy, reliable, and accurate AI.

It’s also time to stop using the term hallucination. Let’s start using terms like “errors,” “mistakes,” or “incorrect outputs,” which are more accurate and encourage a responsible understanding of AI’s limitations. Once a foundation for more trustworthy, reliable AI is built using a universal semantic layer, errors will occur less often, and hallucinations may disappear from the AI lexicon.

David Jayatillake

About David Jayatillake

David Jayatillake is VP of AI with Cube, the AI-powered Universal Semantic Layer. Prior to Cube, David co-founded Delphi Labs. He's worked at Metaplane, Lightdash, Gravity, and more. He holds a B.S. in Mathematics with Management and Finance from King's College London and an ACMA in Accounting and Business/Management from The Chartered Institute of Management Accountants.

Leave a Reply

Your email address will not be published. Required fields are marked *