Retrieval-Augmented Generation (RAG)

Last reviewed April 2026

A language model that answers questions about your organisation's policies is only useful if its answers are grounded in your actual policies, not its training data. Retrieval-augmented generation (RAG) connects a large language model to your own documents, so the model's responses reflect what your organisation actually says, not what the internet generally believes.

What is retrieval-augmented generation?

RAG is an architecture that combines a retrieval system with a generative model. When a user asks a question, the system first searches a knowledge base (typically using semantic search over a vector database) to find relevant documents. Those documents are then passed to the language model as context, along with the original question. The model generates its answer based on the retrieved context rather than relying solely on its training data.

The pattern solves the two biggest problems with using language models in enterprise settings. First, hallucination: a model without access to your documents will generate plausible but fabricated answers about your policies. RAG constrains the model to information that actually exists in your knowledge base. Second, freshness: a model's training data has a cutoff date, but your documents change daily. RAG ensures the model always works with current information.

The architecture has three components: an ingestion pipeline (chunking and embedding documents), a retrieval layer (searching for relevant chunks at query time), and a generation layer (the language model that synthesises the answer). Each component can fail independently, and each affects the quality of the final response.

The landscape

RAG has become the default pattern for enterprise AI applications that need access to organisational knowledge. Every major cloud provider and AI platform offers RAG tooling. Microsoft's Copilot, Amazon's Q, and Google's Vertex AI Search all use RAG architectures internally. The pattern is not proprietary. It is an architectural approach that can be built with open-source components on private infrastructure. The EU AI Act's transparency requirements make RAG attractive because the retrieved sources provide a natural audit trail.

For financial services, the PRA's SS1/23 on model risk management applies to RAG systems that inform material decisions. A RAG-based system that helps an underwriter assess risk or a compliance officer interpret regulation is a model within the PRA's definition. Validation, monitoring, and governance are not optional extras.

The maturity gap between a demo and a production system is significant. A RAG prototype that answers questions about a PDF can be built in an afternoon. A production RAG system that reliably answers questions across 100,000 documents, handles ambiguous queries, respects access controls, and provides citations takes months. Most organisations underestimate this gap.

How AI changes this

Internal knowledge assistants are the primary deployment. Compliance teams querying the regulatory library, underwriters checking policy wordings, claims handlers searching for precedent, and relationship managers preparing for client meetings all benefit from a system that retrieves relevant information and presents it as a synthesised answer rather than a list of documents.

The key differentiator from traditional search is synthesis. A keyword search for "FCA guidance on crypto-asset promotions" returns 47 documents. A RAG system reads those documents and answers: "The FCA requires that crypto promotions include a specific risk warning, are fair and not misleading, and are only communicated by or approved by an authorised person. Here are the three most relevant guidance documents." The user gets an answer, not a reading list.

RAG systems can surface contradictions in organisational knowledge. When different policy documents make conflicting statements (a more common problem than most institutions admit), a RAG system that retrieves both will either flag the contradiction or produce an inconsistent answer. Either outcome is useful: it identifies a governance gap that keyword search would never reveal. For firms managing regulatory reporting obligations, this consistency check is particularly valuable.

What to know before you start

Retrieval quality is the bottleneck, not generation quality. If the retrieval layer returns the wrong documents, the language model will produce a confident, well-written, wrong answer. Invest 70 per cent of your effort in the retrieval pipeline: chunking strategy, embedding model selection, metadata enrichment, and hybrid search configuration. The generation layer is the easy part.

Citations are not optional. In a regulated environment, every answer must be traceable to a source document. Build your RAG system to return the specific chunks that informed each answer, with page numbers, document titles, and publication dates. A compliance officer who cannot verify the source of an AI-generated answer will not trust or use the system.

Chunking is where most RAG projects fail quietly. Splitting documents by fixed character counts ignores document structure. A chunk that spans two unrelated sections produces a blurred embedding that is weakly relevant to both topics. Split by logical boundaries: sections, clauses, paragraphs. Preserve headings as metadata. Test retrieval quality with real queries before optimising anything else.

Start with a single, well-curated document collection. Your data governance policies, your claims handling procedures, or your product documentation. Build the full pipeline, test with actual users, and iterate on retrieval quality. Expanding to additional document collections is straightforward once the pipeline works. Getting the pipeline right is the hard part.

Last updated May 2026

Exploring AI for your organisation? There are fifteen minutes on the calendar.

Let’s build AI together

← Back to AI Glossary