Model Training

Last reviewed April 2026

A data science team spent six months training a credit risk model that outperformed the incumbent on every metric. It failed model validation in week one because the training data excluded the 2020 cohort. Model training is not a data science exercise. It is a governance exercise that happens to involve data science.

What is model training?

Model training is the process of feeding data into a machine learning algorithm so it learns the patterns needed to make predictions or decisions. In financial services, this means selecting relevant historical data, engineering features that capture meaningful signals, choosing an appropriate algorithm, and iterating until the model meets performance, fairness, and interpretability requirements.

The process typically follows a structured pipeline. Data is split into training, validation, and test sets. The model learns on the training set, is tuned using the validation set, and is evaluated on the test set, which it has never seen. This separation prevents overfitting, where the model memorises the training data rather than learning generalisable patterns. In financial services, the test set must also reflect realistic deployment conditions: forward-looking time periods, representative customer segments, and economic conditions that the model will encounter in production.

Training a model is iterative. Feature engineering, the process of transforming raw data into inputs that help the model learn, often matters more than algorithm selection. A well-engineered feature set with a simple algorithm frequently outperforms a complex algorithm with poor features. The craft of model training lies in understanding the domain well enough to know which features carry signal and which carry noise.

The landscape

The PRA's SS1/23 requires that model development follows a documented, repeatable process. Training decisions (data selection, feature engineering, algorithm choice, hyperparameter tuning) must be recorded and justified. An independent validation team must be able to reproduce the training process and verify the results. This rules out ad hoc model development and requires a disciplined, version-controlled approach.

The EU AI Act adds requirements specific to training data for high-risk systems. Article 10 mandates that training data be relevant, representative, and as free of errors and bias as possible. For credit scoring models, this means demonstrating that the training data does not systematically exclude or misrepresent any demographic group. For fraud detection, it means ensuring that the training labels (fraud/not fraud) are accurate and consistently applied.

Cloud-based ML platforms (AWS SageMaker, Azure ML, Google Vertex AI) have standardised much of the training infrastructure. Compute, experiment tracking, hyperparameter search, and model registry are available as managed services. The infrastructure is no longer the bottleneck. Data preparation, governance, and validation are.

How AI changes this

Automated machine learning (AutoML) compresses the model training cycle. AutoML platforms systematically evaluate hundreds of feature combinations, algorithms, and hyperparameter configurations, identifying the best-performing model faster than manual experimentation. For well-defined problems with clean data, AutoML can produce a competitive model in hours rather than weeks. The risk is that speed encourages deployment before governance is complete.

Synthetic data generation addresses the problem of insufficient or imbalanced training data. If genuine fraud cases represent 0.1 per cent of your transaction data, the model struggles to learn fraud patterns. Synthetic data techniques generate realistic but artificial fraud examples that balance the training set. The quality of synthetic data matters enormously: poorly generated synthetic examples introduce artefacts that the model learns as real patterns.

Transfer learning reduces the data and compute requirements for training new models. A model pre-trained on a large, general dataset can be fine-tuned on a smaller, domain-specific dataset to achieve strong performance. For NLP tasks in financial services, this means fine-tuning a foundation model on your specific document types rather than training from scratch, reducing the labelled data requirement from thousands of examples to hundreds.

Continuous training pipelines retrain models automatically when performance degrades or new data becomes available. Rather than a periodic manual retraining cycle, the pipeline monitors model performance, triggers retraining when drift is detected, validates the new model against the incumbent, and promotes it to production if it passes. This is the production reality of model training in mature MLOps environments.

What to know before you start

Document every decision in the training process. Which data was included and excluded, and why. Which features were engineered and which were discarded. Which algorithm was selected and what alternatives were considered. This documentation is not bureaucracy. It is a regulatory requirement under SS1/23 and the EU AI Act, and it is essential for the independent validation team that must review the model before deployment.

Train on data that reflects the deployment environment. A model trained on five years of historical data will learn patterns from economic conditions that may not recur. A model trained only on recent data will lack exposure to stress scenarios. The training data window is a design decision that must balance recency and coverage, and it must be justified in the model documentation.

Reserve your test set rigorously. If any information from the test set leaks into the training process, whether through feature engineering, hyperparameter tuning, or data preprocessing, the model's reported performance will overestimate its real-world accuracy. Use time-based splits for financial data: train on the past, validate on the near past, test on the recent past, deploy on the present.

Start with a well-defined problem where the labels are reliable. Fraud detection (fraud/not fraud confirmed by investigation) and credit default (defaulted/repaid observed over time) provide clear training signals. Problems where the labels are subjective or inconsistent, such as "good" versus "bad" customer service interactions, require extensive label quality work before model training can begin.

Last updated

Exploring AI for your organisation? There are fifteen minutes on the calendar.

Let’s build AI together
← Back to AI Glossary