Model Drift
Last reviewed April 2026
The credit scoring model that achieved 94 per cent accuracy in validation now approves applicants who default at twice the expected rate. The model did not change. The world did. Model drift is what happens when the data a model encounters in production diverges from the data it was trained on, and in financial services, it is not a matter of if but when.
What is model drift?
Model drift is the degradation of a machine learning model's performance over time as the statistical relationship between inputs and outputs changes. Two types dominate. Data drift (covariate shift) occurs when the distribution of input features changes: customers' income patterns shift, transaction volumes fluctuate seasonally, or a new product attracts a different demographic. Concept drift occurs when the relationship between inputs and outputs changes: what constituted a high-risk borrower in 2019 is different from what constitutes one today, even if the input features look similar.
In financial services, drift is constant. Economic cycles change borrower behaviour. Regulatory changes alter reporting patterns. Fraud techniques evolve. Customer demographics shift. Pandemic-era behavioural changes persist in some segments and reverse in others. A model trained on pre-pandemic data and deployed without monitoring will drift. The question is how quickly and how severely.
The danger is that drift is invisible without measurement. The model continues to produce outputs. Those outputs look plausible. The system appears to be working. The degradation only becomes visible when outcomes are measured against predictions, and in financial services, the feedback loop can be months or years long. A credit model's predictions are not fully testable until loans mature. A fraud model's misses are not discovered until losses are reported.
The landscape
The PRA's SS1/23 explicitly requires institutions to monitor for model drift and to have processes for responding when drift is detected. The expectation is that monitoring is continuous, not periodic, and that thresholds for acceptable drift are defined and enforced. Institutions that rely on annual model reviews without interim monitoring are falling below the expected standard.
The EU AI Act requires that high-risk AI systems maintain their performance over time. This effectively mandates drift monitoring for any AI system used in credit decisions, fraud detection, or other high-risk applications. The obligation is not just to detect drift but to respond to it: retrain, recalibrate, or retire the model before degraded performance harms consumers or market integrity.
The COVID-19 pandemic was a stress test for drift monitoring. Models trained on years of stable economic data suddenly encountered a distribution shift without precedent. Institutions with automated drift detection identified the problem in days. Institutions without it discovered the problem months later, when default rates and fraud patterns deviated sharply from expectations. The pandemic accelerated investment in drift monitoring but also revealed how few institutions had it in place.
How AI changes this
Statistical drift detection methods compare the distribution of incoming data to the training distribution on a continuous basis. Tests like the Population Stability Index (PSI), Kolmogorov-Smirnov test, and Jensen-Shannon divergence quantify how much the data has shifted. When the shift exceeds a defined threshold, the system alerts the model owner. These methods are computationally efficient, well understood, and deployable on any MLOps platform.
Predictive drift detection identifies performance degradation before ground truth is available. By monitoring the model's prediction distribution (are approval rates changing, are fraud scores shifting, are risk categories rebalancing), the system can infer potential performance issues without waiting for outcomes. A credit scoring model that suddenly shifts its approval rate by ten percentage points warrants investigation, even before any defaults are observed.
Automated retraining pipelines respond to detected drift by triggering model retraining on updated data. The retrained model is validated against the current production model and promoted only if it performs better. This creates a closed loop: detect drift, retrain, validate, deploy. For fraud detection, where new fraud patterns emerge weekly, automated retraining is essential for maintaining detection rates.
Drift attribution identifies which features are driving the shift. Rather than reporting "the model has drifted," the system reports "the income distribution of applicants has shifted upward by 15 per cent, and the proportion of self-employed applicants has increased by 8 per cent." This granular attribution helps model owners understand whether the drift reflects a genuine population change (which may require model updates) or a data quality issue (which requires upstream investigation).
What to know before you start
Not all drift requires action. Seasonal fluctuations, expected market movements, and gradual demographic shifts may cause detectable drift without meaningfully degrading model performance. Define materiality thresholds that distinguish signal from noise. A PSI of 0.1 may be informational. A PSI above 0.25 typically warrants investigation. Calibrate thresholds to your specific models and risk appetite.
Ground truth latency is the fundamental constraint. You cannot measure performance drift for a credit model until loans mature. You cannot measure performance drift for an insurance model until claims develop. In the interim, data drift and prediction drift are proxies, not substitutes. Design your monitoring to track all three: data drift (available immediately), prediction drift (available immediately), and performance drift (available when ground truth arrives).
Retraining is not always the right response to drift. If the drift is caused by data quality degradation, retraining on bad data will make the model worse. If the drift reflects a fundamental regime change (a new economic environment, a new regulatory requirement), the model may need redesigning, not just retraining. Investigate the cause of drift before triggering the retraining pipeline.
Start by implementing data drift monitoring on your most critical model. PSI on the top ten features, tracked weekly, is a practical starting point that requires minimal infrastructure. Expand to prediction distribution monitoring and automated alerting once the baseline is established. Automated retraining is the final step, appropriate only when the monitoring and governance processes are mature enough to validate retrained models reliably.
Last updated
Exploring AI for your organisation? There are fifteen minutes on the calendar.
Let’s build AI together