Model Validation
Last reviewed April 2026
A model that has never been challenged by someone who did not build it is a model waiting to surprise the organisation. Model validation is the independent assessment that stands between a model's development and its deployment, and in financial services, it is the control that regulators examine most closely when a model failure causes losses.
What is model validation?
Model validation is the independent review of a model's design, assumptions, data, performance, and limitations, conducted by individuals who were not involved in the model's development. Its purpose is to confirm that the model is fit for its intended purpose, that its outputs are reliable within defined boundaries, and that its limitations are documented and understood by its users. In financial services, validation is a regulatory requirement for models used in credit decisions, capital calculations, risk measurement, and pricing.
The scope of validation is broader than testing accuracy. Validators assess conceptual soundness (is the modelling approach appropriate for the problem?), data quality (is the training data representative and free from material errors?), implementation (does the code correctly implement the intended methodology?), and outcomes analysis (does the model perform as expected across different segments, time periods, and conditions?). Each dimension can reveal issues that accuracy metrics alone would miss.
Validation is not a one-time event. Models require periodic revalidation, typically annually, and triggered revalidation when material changes occur: new data sources, methodology changes, expansion to new populations, or significant performance degradation. The ongoing validation cycle ensures that models remain fit for purpose as the environment they operate in changes.
The landscape
The PRA's SS1/23 codifies validation expectations that were previously distributed across multiple supervisory statements. Validation must be genuinely independent: conducted by individuals with no reporting line to the model development function. The depth of validation must be proportionate to the model's materiality and complexity. And validation findings must be tracked to resolution, with senior management oversight of outstanding issues.
The FCA's focus on consumer outcomes under the Consumer Duty adds a conduct dimension to validation. A model that is statistically sound but produces systematically poor outcomes for a specific customer segment has a conduct risk that validation should identify. This is particularly relevant for models used in pricing, claims handling, and credit decisions, where unfair outcomes can emerge from models that optimise for commercial metrics without fairness constraints.
The talent constraint is real. Effective model validators need deep statistical knowledge, domain expertise in financial services, and increasingly, proficiency in machine learning techniques. This combination is scarce. Many institutions struggle to staff their validation functions adequately, leading to validation backlogs that create regulatory exposure. The backlog problem is structural, not cyclical: the volume of models requiring validation grows faster than the validation capacity.
How AI changes this
Machine learning models present specific validation challenges. Traditional models have explicit assumptions that can be individually tested. ML models learn their own assumptions from data, making it harder to identify what the model "believes" and whether those beliefs are appropriate. Explainability tools partially address this, but interpreting a complex ML model remains harder than interpreting a logistic regression.
Automated testing frameworks run comprehensive validation suites across ML models: performance by segment, stability over time, sensitivity to input perturbations, and behaviour at decision boundaries. These frameworks standardise the validation process and ensure that common failure modes are tested consistently. A test suite that checks a fraud detection model's performance across customer demographics, transaction types, and time periods is more thorough than manual spot-checking and more repeatable.
Continuous validation replaces the annual validation cycle for ML models that update frequently. Instead of a comprehensive review once a year, automated monitoring continuously evaluates model performance against validation benchmarks, triggering a full revalidation only when metrics breach defined thresholds. This approach is more resource-efficient and catches degradation earlier than periodic reviews.
Synthetic data generation enables validators to test models in scenarios where real data is sparse. Testing how a credit model behaves during a severe recession is difficult when the training data comes from a benign economic period. Synthetic data that simulates stress conditions allows validators to assess model behaviour in environments the model has not experienced, identifying vulnerabilities before they materialise.
What to know before you start
Independence must be genuine, not structural. A validation function that reports to the same executive as the model development function, or that lacks the authority to block deployment when findings are material, provides cosmetic assurance. The model risk management framework must give validators genuine authority, including the ability to recommend model retirement when remediation is not feasible.
Validation of vendor models requires a different approach. You will not have access to the model's source code or full training data. Validation must rely on outcomes testing, sensitivity analysis, and assessment of the vendor's own validation documentation. Negotiate data and documentation access in vendor contracts before procurement. Adding validation requirements after deployment gives you limited leverage.
Document the model's limitations explicitly. Every model has a boundary beyond which its outputs are unreliable: a population it was not trained on, a market condition it has not experienced, a data quality threshold below which it degrades. Validators should identify these boundaries and ensure model users understand them. A model with known, documented limitations is safer than a model assumed to work everywhere.
Start with your highest-risk models. Prioritise validation resource on models that directly affect customer outcomes, regulatory capital, or financial reporting. A risk-tiered validation schedule, with annual validation for critical models and biennial for lower-risk ones, allocates scarce validation capacity where it matters most. Ensure the governance framework defines these tiers clearly so that prioritisation decisions are systematic, not ad hoc.
Last updated
Exploring AI for your organisation? There are fifteen minutes on the calendar.
Let’s build AI together