Retrieval-Augmented Generation

What is retrieval-augmented generation?

Retrieval-augmented generation (RAG) is a technique for enhancing LLM responses by retrieving relevant information from external data sources and adding it to the user's prompt before the LLM generates its response. Instead of relying solely on the knowledge embedded in the LLM's training data, RAG allows the model to access and incorporate specific, up-to-date information from databases, documentation, or other knowledge sources.

How does RAG work?

The process follows a clear pattern:

  1. User enters a query — Someone asks a question in a conversational interface
  2. System searches for relevant knowledge — The system looks at the query and searches a vector database or other data source for related information
  3. Context is added to the prompt — Relevant pieces of information are retrieved and added to the user's original query
  4. LLM generates response — The LLM receives both the user's question and the related knowledge, enabling a more informed, accurate response

This approach takes vectorized data from semantic search, retrieves relevant bits and pieces, and re-aggregates them for the LLM to use when crafting its response.

Why use RAG instead of other approaches?

RAG enables LLMs to work with proprietary or domain-specific data that wasn't included in their original training. For example, a company can use RAG to give an LLM access to internal documentation, customer support histories, or specialized knowledge bases.

RAG also provides attribution and transparency—teams can show users which specific sources informed the LLM's response, making answers more trustworthy and verifiable. This is more scalable and cost-effective than trying to cram all relevant information into every prompt, which quickly becomes impractical with large knowledge bases.

Learn more:

Related terms:

← Back to Ai Glossary