Large Language Model (LLM)

Last reviewed April 2026

Financial institutions generate millions of pages of unstructured text each year: contracts, compliance memos, customer correspondence, regulatory filings. A large language model (LLM) can read, summarise, and draft that text in seconds. But deploying one inside a regulated environment raises questions about accuracy, confidentiality, and data governance that the technology alone does not answer.

What is a large language model?

A large language model is a neural network trained on vast quantities of text to predict what comes next in a sequence. GPT-4, Claude, Llama, and Gemini are the most widely known. Training runs consume billions of words drawn from books, websites, code, and public documents. The result is a system that can generate fluent prose, answer questions, translate between languages, and follow complex instructions.

The scale is part of the definition. Modern LLMs contain tens to hundreds of billions of parameters (the adjustable weights learned during training). This scale is what gives them their generality. A smaller, task-specific model might extract names from a contract. An LLM can extract names, summarise the contract, identify unusual clauses, and draft a response to the counterparty, all from the same model.

The trade-off is control. An LLM produces probabilistic outputs, not deterministic ones. Ask it the same question twice and you may get two different answers. For a chatbot this is acceptable. For a regulatory filing it is not. Understanding where LLMs are reliable and where they need constraints is the core challenge for financial services adoption.

The landscape

The EU AI Act introduces tiered obligations for general-purpose AI models, including LLMs. Providers of models above a computational threshold must conduct model evaluations, assess systemic risks, and report serious incidents. Deployers who use LLMs in high-risk applications (credit scoring, insurance underwriting) face additional transparency and oversight requirements. These rules apply from August 2025 for the most powerful models.

The UK is taking a different path. Rather than legislating model-level obligations, the FCA and PRA expect firms to apply existing frameworks (model risk management, operational resilience, outsourcing) to LLM deployments. The PRA's SS1/23 on model risk management applies to any model that informs a material business decision, including an LLM used for credit assessment or claims triage.

Open-source models have changed the economics. Llama, Mistral, and similar models can run on private infrastructure, avoiding the data residency concerns of sending customer data to a third-party API. For a bank processing sensitive customer information, the choice between a hosted API and a self-hosted model is a governance decision as much as a technical one. Model hosting costs have dropped roughly 90 per cent since 2023, making private deployment viable for mid-sized institutions.

How AI changes this

Document processing is the most mature LLM application in financial services. Summarising lengthy compliance reports, extracting key terms from ISDA agreements, and drafting first responses to customer complaints are all production-ready. These tasks share a common profile: the cost of a human doing the work is high, the tolerance for imperfection is moderate, and a human reviews the output before it leaves the organisation.

Code generation is the second wave. Compliance teams maintaining regulatory calculation engines, actuaries building pricing models, and data engineers writing ETL pipelines all report 30 to 50 per cent productivity gains when using LLM-assisted coding tools. The value is not that the model writes perfect code. It writes a reasonable first draft that a skilled developer refines.

Customer-facing applications remain the hardest to deploy safely. An LLM answering customer queries about their mortgage or insurance policy must be accurate, must not hallucinate terms that do not exist in the contract, and must comply with the FCA's consumer duty requirements. Retrieval-augmented generation and guardrails are the two techniques that make this viable, but neither eliminates the risk entirely.

What to know before you start

Start with internal use cases, not customer-facing ones. Summarising meeting notes, drafting internal reports, and assisting with code review all deliver value with lower regulatory risk. These deployments build organisational confidence and surface integration challenges before the stakes are high.

Data classification is the prerequisite. Before any employee uses an LLM, your organisation needs a clear policy on what data can be sent to which model. Customer PII sent to a third-party API is a data breach waiting to happen. Classify your data, then match the classification to the deployment model: public data to hosted APIs, sensitive data to private infrastructure only.

Prompt design is not a trivial skill. The quality of an LLM's output depends heavily on how the question is asked. Invest in prompt engineering as a discipline, not a novelty. Document your prompts, version them, and test them the way you would test any other piece of business logic.

Model evaluation is ongoing. LLM providers update their models regularly, and an update can change the behaviour of a system you have already validated. Build automated evaluation suites that test your specific use cases against each model version. Treat model updates with the same change management rigour you apply to core system upgrades.

Last updated May 2026

Exploring AI for your organisation? There are fifteen minutes on the calendar.

Let’s build AI together

← Back to AI Glossary