The RAG process follows a clear pattern: A user enters a query in a conversational interface, the system searches a vector database or other data source for related information, relevant pieces of information are retrieved and added to the user's original query, and then the LLM generates a response using both the question and the related knowledge. This approach takes vectorized data from semantic search, retrieves relevant bits and pieces, and re-aggregates them for the LLM to use.

Retrieval-Augmented Generation

What is retrieval-augmented generation?

Retrieval-augmented generation (RAG) is a technique for enhancing LLM responses by retrieving relevant information from external data sources and adding it to the user's prompt before the LLM generates its response. Instead of relying solely on the knowledge embedded in the LLM's training data, RAG allows the model to access and incorporate specific, up-to-date information from databases, documentation, or other knowledge sources.

How does RAG work?

The process follows a clear pattern:

User enters a query — Someone asks a question in a conversational interface
System searches for relevant knowledge — The system looks at the query and searches a vector database or other data source for related information
Context is added to the prompt — Relevant pieces of information are retrieved and added to the user's original query
LLM generates response — The LLM receives both the user's question and the related knowledge, enabling a more informed, accurate response

This approach takes vectorized data from semantic search, retrieves relevant bits and pieces, and re-aggregates them for the LLM to use when crafting its response.

Why use RAG instead of other approaches?

RAG enables LLMs to work with proprietary or domain-specific data that wasn't included in their original training. For example, a company can use RAG to give an LLM access to internal documentation, customer support histories, or specialized knowledge bases.

RAG also provides attribution and transparency—teams can show users which specific sources informed the LLM's response, making answers more trustworthy and verifiable. This is more scalable and cost-effective than trying to cram all relevant information into every prompt, which quickly becomes impractical with large knowledge bases.

Learn more:

Related terms:

← Back to Ai Glossary

Make better product decisions.

What is retrieval-augmented generation?

How does RAG work?

Why use RAG instead of other approaches?

Make better product decisions.