Fine-Tuning vs RAG
Last reviewed April 2026
You want a language model that understands your firm's products, policies, and terminology. Do you retrain the model on your data, or do you feed it your documents at query time? Fine-tuning and RAG answer the same question differently, and choosing wrong costs months and money. Most financial services teams should start with RAG. Some should not.
What is fine-tuning vs RAG?
Fine-tuning modifies a foundation model's weights by training it further on your organisation's data. The model learns your vocabulary, your document structures, your domain conventions. The knowledge becomes part of the model itself. After fine-tuning, the model "knows" your domain without needing external documents at inference time.
RAG (retrieval-augmented generation) leaves the model's weights unchanged. Instead, it retrieves relevant documents from your knowledge base at query time and passes them to the model as context. The model generates its response based on the retrieved documents. The knowledge lives in your document store, not in the model.
The trade-offs are concrete. Fine-tuning requires a training dataset (typically thousands of examples), ML expertise, compute resources, and weeks of iteration. It produces a model that is faster at inference and more fluent in your domain. RAG requires a document pipeline, a vector database, and embedding infrastructure. It produces a system that always has access to current information and can cite its sources.
The landscape
The industry consensus has shifted toward RAG as the default starting point. The practical reason: most financial services organisations need their AI system to reflect documents that change regularly (policies, regulatory guidance, product terms), and fine-tuning cannot keep up with that rate of change. A fine-tuned model that learned your policies six months ago does not know about the guidance published last week. A RAG system retrieves last week's guidance automatically.
The PRA's SS1/23 adds a governance dimension. A fine-tuned model is a new model that requires validation under model risk management. A RAG system using a third-party foundation model may fall under a different (lighter) governance framework, depending on how the firm classifies it. The model risk implications of each approach should inform the decision alongside the technical trade-offs.
The EU AI Act treats fine-tuned models and RAG-augmented systems differently. A fine-tuned model that changes the base model's behaviour may trigger additional compliance obligations. A RAG system that uses the base model unchanged, with external retrieval only, has a simpler regulatory profile. Cost has shifted the calculus further. Fine-tuning a large model costs tens of thousands of pounds per training run, and you may need multiple runs to get the quality right. RAG infrastructure costs are more predictable: embedding compute, vector storage, and API calls. For most use cases, RAG is cheaper to build, cheaper to maintain, and faster to deploy. Fine-tuning is justified when RAG's retrieval overhead is unacceptable or when the task requires a specialised output format that prompting alone cannot achieve.
How AI changes this
RAG is the right choice when your AI system needs to answer questions about a corpus that changes. Policy queries, regulatory guidance, customer documentation, and internal knowledge bases all fit this pattern. The system always reflects the latest version of every document. Answers include citations. New documents are searchable within minutes of ingestion.
Fine-tuning is the right choice when you need the model to behave differently, not just know more. Training a model to generate outputs in a specific regulatory format (SAR narratives, IFRS 17 disclosures, reinsurance slip summaries) is a fine-tuning problem. The model needs to learn a style and structure that is unique to your domain. Prompt engineering can approximate this, but fine-tuning produces more consistent results for highly structured outputs.
The combination is increasingly common. Fine-tune the model for your domain's writing style and terminology. Use RAG to provide it with current documents at query time. The fine-tuned model is more fluent with your terminology; the RAG pipeline ensures it works with current information. This hybrid approach is what the most mature financial services deployments use, but it is also the most expensive to build and maintain.
For document extraction tasks, fine-tuning smaller, specialised models often outperforms RAG with a large general model. Extracting structured data from Lloyd's of London bordereaux or ACORD forms is a pattern-matching task where a fine-tuned model can achieve 95 per cent accuracy at a fraction of the inference cost.
What to know before you start
Start with RAG unless you have a specific reason not to. RAG is faster to deploy, easier to maintain, and produces citeable answers. If RAG's performance is insufficient for a specific use case (usually because the task requires a specialised output format or domain-specific reasoning that the base model cannot perform with retrieved context alone), then evaluate fine-tuning for that specific use case.
Fine-tuning requires data quality investment. The model learns from your training data, including its errors, biases, and inconsistencies. If your training data contains outdated policy language, the model will reproduce it. Curate your training set with the same rigour you would apply to any data governance exercise. Quality in, quality out.
Fine-tuned models need ongoing retraining. Your domain evolves. Regulations change. Products change. A fine-tuned model from January does not know about the regulatory changes from March. Budget for quarterly or semi-annual retraining cycles, including the validation effort that each retraining requires under model risk management.
Measure before deciding. Build a RAG prototype first. Test it against your actual use cases. Measure where it succeeds and where it fails. If the failures are retrieval-related (wrong documents returned), fix the retrieval pipeline. If the failures are generation-related (right documents but wrong output format or reasoning), that is your signal to evaluate fine-tuning. Let evidence guide the decision, not architectural preference.
Last updated
Exploring AI for your organisation? There are fifteen minutes on the calendar.
Let’s build AI together