Home Emerging Technology How RAG makes generative AI tools even better

by Lucas Mearian

Senior Reporter

How RAG makes generative AI tools even better

feature

Feb 20, 20244 mins

Artificial IntelligenceAugmented RealityGenerative AI

Retrieval augmented generation, or 'RAG' for short, creates a more customized and accurate generative AI model that can greatly reduce anomalies such as hallucinations.

Credit: Shutterstock/a-image

As more organizations turn to generative artificial intelligence (genAI) tools to transform massive amounts of unstructured data and other assets into usable information, being able to find the most relevant content during the AI generation process is critical.

Retrieval augmented generation or “RAG” for short, is a technology that can do just that by creating a more customized genAI model that enables more accurate and specific responses to queries.

Large language models (LLMs), also called deep-learning models, are the basis of genAI technology; they’re pre-trained on vast amounts of unlabeled or unstructured data that, by the time a model is available for use, can be outdated and not specific to a task.

LLMs can consist of a neural network with billions or even a trillion or more parameters. RAG optimizes the output of an LLM by referencing (accessing) an external knowledge base outside of the information on which it was trained. In other words, RAG enables genAI to find and use relevant external information, often from an organization’s proprietary data sources or other content to which it’s directed.

It not only amplifies an LLM’s knowledge base “but also significantly improves the accuracy and contextuality of its outputs,” Microsoft explained in a blog.

RAG is essentially a design pattern that uses search functionality to retrieve pertinent data and add it to the prompt of a genAI model to better ground the generative output with factual and new information.

“RAG can be used for both retrieving public internet data as well as for retrieving data from private knowledge bases,” according to Gartner Research.

Patrick Lewis, a natural language processing research scientist with start-up Cohere, originally coined the term RAG in a paper published in 2020. Lewis pointed out that LLMs cannot easily expand or revise their memory, and they can’t straightforwardly provide insight into their predictions, leading to “hallucinations.”

Just last week, Slack unveiled AI-based tools for businesses and cited RAG as one way the company hopes to reduce halluciations in genAI results.

In addition to Cohere, more than a half dozen vendors provide native or stand-alone solutions for developers to build RAG-based applications for an LLM. They include Vectara, OpenAI, Microsoft Azure Search, Google Vertex AI, LangChain, LlamaIndex and Databricks.

“More and more the solutions around RAG — and enabling people to use that more effectively — are going to focus on tying into the right data that has business value as opposed to just the raw productivity improvements,” said Rick Villars, IDC group vice president of worldwide research.

With RAG, organizations can maximize the chances of producing accurate results based on factual inputs, said Avivah Litan, distinguished vice president analyst at Gartner. It also minimizes the chances of hallucinations, since outputs are grounded with retrieved data.

RAG also allows workers to find, summarize, and utilize the information they’re looking for faster by using the power of third-party LLMs applied to an organization’s own data. It also helps protect the organization from liability incurred when copyrighted or other IP protected materials get incorporated into LLM responses.

“This possibility is greatly reduced, because the prompt responses can be grounded in enterprise data,” Litan said.

One way to get better access to business information using RAG is with a vector database and graph technologies that can tap into proprietary data and allow an organization to truly dig into the business value, Villars said.

A vector database stores, indexes, and manage massive quantities of high-dimensional vector data efficiently; as a result, companies are spending money to develop them or add vector search capabilities to their existing SQL or NoSQL databases and genAI use cases and applications.

By 2026, more than 30% of enterprises are expected to adopt vector databases to ground their foundation models with relevant business data, according to Gartner Research. Gartner lists vector databases as “critical enabler” enterprise technology for 2024.

Popular uses for vector databases include product recommendations, similarity search, fraud detection and generative-AI-powered, question-and-answer applications, according to Gartner.

Vector databases can and often do serve as the backbone of RAG systems. The databases store and manage data typically derived from text, images, or sounds, which are converted into mathematical vectors.

^“The other part of that is back to app modernization, ” Villars said. “One of the biggest legacy install bases companies have today are old client-server apps and even early mobile and cloud apps built on Java. We have to modernize those to make them part of this AI story.”

by Lucas Mearian

Senior Reporter

Senior Reporter Lucas Mearian covers AI in the enterprise, Future of Work issues, healthcare IT and FinTech.

Americas

Asia

Europe

Oceania

Topics

About

Policies

Our Network

More

How RAG makes generative AI tools even better

Retrieval augmented generation, or 'RAG' for short, creates a more customized and accurate generative AI model that can greatly reduce anomalies such as hallucinations.

More from this author

What is a CAIO — and what should they know?

IT pros find generative AI doesn’t always play well with others

Afraid AI will steal your job? You’re not alone

DuckDuckGo launches anonymous AI chatbot

Most popular authors

Show me more

The rise of AI-powered killer robot drones

Adobe brings AI image generation to Acrobat

How to use iCloud with Windows

Podcast: Apple joins the AI party with 'personal intelligence' tools

Podcast: Is the AI hype justified or will the bubble ‘burst’?

Podcast: Does age discrimination exist in the tech industry?

Apple joins the AI party with 'personal intelligence' tools

Is the AI hype justified or will the bubble 'burst'?

Does age discrimination exist in the tech industry?

How RAG makes generative AI tools even better

Retrieval augmented generation, or 'RAG' for short, creates a more customized and accurate generative AI model that can greatly reduce anomalies such as hallucinations.

Related content

AR/VR headset sales decline is temporary: IDC

Apple's cautious AI strategy is absolutely right

Varjo wants you to create photorealistic VR ‘scenes’ with your phone

When it comes to AI, Apple is opening up for intelligence

From our editors straight to your inbox

More from this author

What is a CAIO — and what should they know?

IT pros find generative AI doesn’t always play well with others

Afraid AI will steal your job? You’re not alone

DuckDuckGo launches anonymous AI chatbot

Most popular authors

Show me more

The rise of AI-powered killer robot drones

Adobe brings AI image generation to Acrobat

How to use iCloud with Windows

Podcast: Apple joins the AI party with 'personal intelligence' tools

Podcast: Is the AI hype justified or will the bubble ‘burst’?

Podcast: Does age discrimination exist in the tech industry?

Apple joins the AI party with 'personal intelligence' tools

Is the AI hype justified or will the bubble 'burst'?

Does age discrimination exist in the tech industry?