Guardrails

Last reviewed April 2026

An AI system that drafts customer communications should not disclose confidential pricing, invent policy terms that do not exist, or produce advice that conflicts with regulatory requirements. Guardrails are the programmatic constraints that prevent AI systems from producing harmful, non-compliant, or simply wrong outputs. In financial services, they are the difference between a useful tool and a regulatory incident.

What are guardrails?

Guardrails are the technical controls placed around an AI system to constrain its behaviour within acceptable boundaries. They operate at multiple layers: input guardrails filter what goes into the model (blocking prompt injection, personally identifiable information, or out-of-scope queries), output guardrails check what comes out (validating factual claims, enforcing format compliance, blocking prohibited content), and process guardrails govern how the system operates (rate limiting, logging, escalation triggers).

The concept is not unique to AI. Financial services already applies guardrails to every system that processes customer data or makes business decisions: transaction limits, approval workflows, four-eyes checks, segregation of duties. AI guardrails are the equivalent for systems that generate text, make recommendations, or take actions based on language model outputs.

The implementation ranges from simple rule-based checks (reject any output containing a social security number pattern) to sophisticated classifier models (detect whether the output constitutes financial advice that requires regulatory authorisation). Most production systems combine both approaches. Rules catch known patterns. Models catch novel ones.

The landscape

The EU AI Act mandates risk mitigation measures for high-risk AI systems, including "appropriate human-machine interface tools" that allow users to understand and override the system. Guardrails are the primary implementation of this requirement. An AI system used for credit decisions, insurance underwriting, or fraud detection without guardrails cannot meet the Act's requirements.

The FCA's third-party AI guidance expects firms to demonstrate that they understand the risks of AI outputs and have controls in place to manage them. This is not a future requirement. It applies now to any firm deploying AI in customer-affecting processes. The guardrail architecture is part of the evidence that the firm has effective controls.

Open-source guardrail frameworks (Guardrails AI, NeMo Guardrails, LLM Guard) have emerged alongside commercial offerings from AI platform vendors. These provide pre-built checks for common risks: PII detection, toxicity filtering, topic restriction, and factual grounding validation. The frameworks accelerate deployment but require configuration for financial services specifics. A generic toxicity filter does not catch a model that provides unauthorised investment advice in polite language.

How AI changes this

Input guardrails prevent prompt injection and topic drift. A customer-facing chatbot for insurance queries should not answer questions about investment products, no matter how the question is phrased. Input classifiers detect off-topic queries and return a standard redirect rather than allowing the model to generate an uncontrolled response. This is production-ready and essential for any customer-facing deployment.

Output guardrails enforce compliance at the point of generation. A model drafting a mortgage offer letter can be checked against a template to ensure all required disclosures are present, the interest rate matches the approved rate, and the cooling-off period is correctly stated. These checks run in milliseconds and catch errors that a human reviewer might miss after reading dozens of similar letters.

Guardrails enable graduated autonomy. An AI agent can be given permission to handle straightforward cases automatically while escalating complex or ambiguous ones to a human. The guardrail is the decision boundary: cases below a confidence threshold, cases involving amounts above a limit, or cases matching patterns associated with past errors are routed to human review. This allows organisations to capture the efficiency of automation while maintaining the safety of human oversight.

Real-time monitoring through guardrails produces the audit trail that regulators expect. Every blocked output, every escalation, every override is logged. This data is invaluable for model monitoring: an increase in blocked outputs may signal model degradation, a new type of prompt injection, or a shift in customer behaviour that the system is not handling well.

What to know before you start

Define your risk taxonomy before writing guardrail rules. What outputs are unacceptable? Financial advice without authorisation. Customer PII in logs. Fabricated policy terms. Discriminatory language. Each risk needs a specific guardrail. A generic "don't say bad things" instruction in the system prompt is not a guardrail. It is a hope.

Layer your defences. Prompt-level instructions ("do not provide investment advice") are the first line but can be bypassed. Input classifiers that detect investment-related queries are the second line. Output classifiers that detect investment advice in the response are the third. Each layer catches what the previous layer missed. Relying on any single layer is insufficient for a regulated environment.

Test adversarially. Your guardrails will face users who accidentally or deliberately try to circumvent them. Red-team your AI system: attempt prompt injection, topic manipulation, and boundary-pushing queries. If your guardrails fail under adversarial testing, they will fail in production. Conduct these tests quarterly, not just at deployment. New attack techniques emerge regularly.

Start by mapping your existing operational controls (approval limits, escalation triggers, four-eyes checks) and implementing their AI equivalents. A claims processing system that requires manager approval for settlements above 10,000 pounds should have a guardrail that prevents the AI from recommending automatic settlement above that threshold. Translate your existing governance into guardrail rules, then add AI-specific controls (hallucination detection, PII filtering, prompt injection defence) on top.

Last updated May 2026

Exploring AI for your organisation? There are fifteen minutes on the calendar.

Let’s build AI together

← Back to AI Glossary