MLOps

Last reviewed April 2026

Building a machine learning model takes weeks. Keeping it running reliably in production takes years. Most AI programmes budget for the first and are surprised by the second. MLOps is the discipline that bridges this gap, turning experimental models into production systems that a regulated institution can depend on.

What is MLOps?

MLOps (machine learning operations) applies software engineering and DevOps principles to the lifecycle of machine learning models. It covers everything from data versioning and experiment tracking through model deployment, monitoring, retraining, and retirement. The goal is to make ML models as reliable, reproducible, and governable in production as traditional software systems.

In financial services, MLOps addresses a specific organisational failure mode. Data science teams build models that perform well in notebooks. Engineering teams struggle to deploy them. Once deployed, nobody monitors whether the model's performance is degrading. When the model fails, there is no defined process for retraining, revalidating, and redeploying. The result is a growing inventory of models in production with unknown performance characteristics, which is exactly what the PRA's SS1/23 was designed to prevent.

The MLOps maturity spectrum ranges from manual (data scientists retrain models on their laptops and hand them to engineering) to fully automated (pipelines retrain, validate, and deploy models without human intervention). Most financial institutions sit at level 1 or 2 on a five-level scale: some automation of deployment but manual retraining, limited monitoring, and inconsistent governance. Reaching level 3, where retraining is automated and monitoring triggers governance workflows, is the practical target for most regulated firms.

The landscape

Model risk management regulations drive MLOps adoption in financial services more than technical efficiency does. SS1/23 requires that models are inventoried, validated, monitored, and governed throughout their lifecycle. Meeting these requirements manually is possible for a handful of models but impractical for the dozens or hundreds that large institutions operate. MLOps platforms provide the automation that makes regulatory compliance scalable.

The tooling landscape has matured significantly. Open-source platforms (MLflow, Kubeflow, Metaflow) provide experiment tracking, model registry, and deployment capabilities. Cloud providers offer managed MLOps services (SageMaker Pipelines, Azure ML, Vertex AI Pipelines). Specialist vendors target regulated industries with pre-built governance workflows. The tools are no longer the constraint. Organisational adoption, the willingness to change how data science teams work, is.

The EU AI Act's requirements for ongoing monitoring of high-risk systems make MLOps a regulatory prerequisite, not just a technical best practice. The convergence of MLOps and LLMOps is creating complexity. Traditional ML models (gradient-boosted trees, logistic regression) have well-understood deployment, monitoring, and retraining patterns. Large language models introduce new challenges: prompt versioning, retrieval-augmented generation pipelines, hallucination monitoring, and evaluation metrics that are inherently subjective. Institutions need MLOps platforms that handle both paradigms.

How AI changes this

Automated retraining pipelines detect when a model's performance has degraded and trigger retraining without manual intervention. The pipeline ingests new data, retrains the model, runs validation checks (accuracy, fairness, stability), compares performance against the incumbent model, and promotes the new version if it passes all checks. For fraud detection models that must adapt to rapidly evolving fraud patterns, automated retraining is not a convenience. It is a necessity.

Feature stores standardise the way features are computed, stored, and served to models. A feature computed for training must be computed identically for inference. Feature stores enforce this consistency, preventing the training-serving skew that causes models to behave differently in production than in development. They also enable feature reuse across models, reducing duplicate engineering work.

Model monitoring is the MLOps capability with the most direct regulatory value. Automated monitoring tracks input data distributions, prediction distributions, accuracy metrics, and fairness metrics continuously. When any metric breaches a defined threshold, the monitoring system alerts the model owner and, optionally, triggers the retraining pipeline. This converts the PRA's monitoring expectations from a periodic manual review into a continuous automated process.

What to know before you start

Start with a model registry before anything else. An inventory of every model in production, its version, its owner, its training data, and its current performance is the foundation. SS1/23 requires it, and you cannot govern what you cannot see. If you have models in production that nobody has inventoried, that is your first task. Build the model inventory before building the automation.

Don't over-engineer for your current model count. If you have five models in production, a lightweight MLOps setup (MLflow for tracking, a simple CI/CD pipeline for deployment, Prometheus for monitoring) is sufficient. If you have fifty, a managed platform is justified. Match the infrastructure investment to the scale, and plan to evolve it as the model portfolio grows.

MLOps is a team capability, not a tool purchase. The tooling is the easy part. The hard part is getting data scientists, ML engineers, DevOps engineers, and model risk management teams to work together with shared processes and shared accountability. Define the operating model (who builds, who deploys, who monitors, who governs) before selecting the tooling.

Budget for MLOps as a percentage of your AI programme, not as a separate initiative. A common ratio is 40 to 60 per cent of the total AI investment going to operations and governance versus 40 to 60 per cent to development. If your budget is entirely allocated to model building, your models will not survive production.

Last updated

Exploring AI for your organisation? There are fifteen minutes on the calendar.

Let’s build AI together
← Back to AI Glossary