Human-in-the-Loop

Last reviewed April 2026

The model recommends declining the claim. The handler reviews the recommendation, agrees, and presses "confirm." Total review time: four seconds. Is this human oversight, or is it automation with an extra click? The distinction between genuine human-in-the-loop review and rubber-stamping is the difference that regulators, auditors, and courts will scrutinise when an AI decision goes wrong.

What is human-in-the-loop?

Human-in-the-loop (HITL) is a design pattern where a human reviews, modifies, or approves an AI system's output before it takes effect. In financial services, this means a person intervening in the decision pipeline: a credit analyst reviewing a model's recommendation before issuing a decline, a claims handler assessing a triage decision before routing a case, or a compliance officer reviewing an AML alert before filing a suspicious activity report. The human provides judgement that the model cannot: context, empathy, ethical reasoning, and accountability.

The pattern exists on a spectrum. At one end, the human makes the decision and the AI provides information (human-in-command). At the other, the AI makes the decision and the human monitors for exceptions (human-on-the-loop). Between these extremes, the AI recommends and the human approves (human-in-the-loop proper). The appropriate position on this spectrum depends on the decision's consequences, the model's reliability, and the regulatory context.

The operational challenge is automation bias: the tendency for humans to defer to automated recommendations. Research consistently shows that humans agree with AI recommendations at rates exceeding 90 per cent, even when the recommendations are deliberately incorrect. A HITL process that does not actively counteract automation bias provides the appearance of oversight without the substance. Designing effective HITL requires understanding this tendency and building countermeasures into the workflow.

The landscape

The EU AI Act requires human oversight for high-risk AI systems. Article 14 specifies that systems must be designed to allow effective oversight by natural persons, including the ability to understand the system's capabilities and limitations, to correctly interpret its outputs, and to decide not to use the system or to override its output. This is a design requirement, not just an operational one: the system must be built to support meaningful human intervention.

The FCA's Consumer Duty reinforces the expectation that firms maintain appropriate human involvement in decisions that affect customer outcomes. The FCA has not mandated HITL for all AI decisions, but it expects firms to demonstrate that the level of human oversight is proportionate to the decision's impact and that the oversight is effective, not merely procedural.

GDPR Article 22 gives individuals the right not to be subject to decisions based solely on automated processing that produce legal or similarly significant effects. Financial services decisions, credit, insurance, account management, fall squarely within this scope. HITL is the most common mechanism for satisfying Article 22, but it must be "meaningful" human involvement, a standard that rubber-stamping does not meet.

How AI changes this

Intelligent case routing directs human attention where it matters most. Rather than reviewing every AI decision, the HITL process can focus human review on cases where the model is least confident, where the decision is most consequential, or where the model's reasoning conflicts with business rules. This tiered approach concentrates scarce human expertise on the cases that need it most and allows straightforward cases to proceed with lighter oversight.

Explainability tools support effective human review by presenting the model's reasoning alongside its recommendation. A credit analyst who can see that the decline was driven by high utilisation and recent applications can evaluate whether that reasoning is appropriate for this specific applicant. Without this context, the analyst is asked to agree or disagree with a number, which is not meaningful oversight.

Disagreement tracking monitors the rate at which human reviewers override AI recommendations. A sustained override rate below 2 per cent suggests either that the model is nearly perfect (unlikely for complex decisions) or that the human is not exercising independent judgement. A healthy override rate, typically 5 to 15 per cent depending on the use case, indicates that the human is engaging meaningfully with the model's output. Tracking this metric is essential for demonstrating to regulators that HITL is substantive.

Calibration exercises periodically test human reviewers with cases where the model is deliberately incorrect, measuring whether the reviewer detects the error. These exercises maintain reviewer vigilance and provide data on the effectiveness of the HITL process. Institutions that run regular calibration exercises can demonstrate to regulators that their human oversight is genuine.

What to know before you start

Design for disagreement, not for confirmation. The HITL interface should present the AI's recommendation as one input among several, not as the answer. Show the relevant data, the model's reasoning, and any flags or anomalies. Make the "override" action as easy as the "confirm" action. If overriding requires three additional steps and a justification form while confirming requires one click, the process design discourages the behaviour you are trying to encourage.

Review time budgets must be realistic. If a reviewer has two minutes per case and the case requires ten minutes of context review for meaningful assessment, the HITL process is designed to fail. Match the review time allocation to the decision's complexity. For simple, low-risk decisions, brief review is appropriate. For consequential decisions, allocate the time needed for genuine evaluation.

Measure and report override rates to senior management. If the board believes that human oversight governs AI decisions but the override rate is 0.5 per cent, the board has an inaccurate picture of how the firm operates. Transparency about the effectiveness of HITL processes is a governance essential.

Start by mapping your existing AI-assisted decision processes and assessing whether the human involvement is meaningful. Where it is not, redesign the interface, the workflow, or the time allocation. Where automation bias is entrenched, consider rotating reviewers, introducing calibration exercises, or requiring written justification for a sample of confirmations. Effective HITL is a design discipline, not a staffing decision.

Last updated

Exploring AI for your organisation? There are fifteen minutes on the calendar.

Let’s build AI together
← Back to AI Glossary