Playbook

Deploying AI in Enterprise

A practical playbook for deploying AI inside regulated financial services. Readiness, investment, operations, architecture, governance, and scaling. Written for the leadership team making the call.

Why now

On 4 May 2026, both major AI model providers announced dedicated ventures to deploy AI inside mid-size companies. Anthropic partnered with Blackstone, Goldman Sachs, and Hellman & Friedman to launch a $1.5 billion AI services firm embedding engineers inside portfolio companies (Blackstone announcement). OpenAI finalised The Deployment Company, a $10 billion vehicle anchored by TPG with 19 investors, following the same forward-deployed engineer model (TechCrunch coverage).

Both ventures target financial services, healthcare, and manufacturing. Both embed engineers inside client organisations rather than selling licences. Both raised capital from alternative asset managers to get preferred access to their investors' portfolio companies. The question for leadership teams is no longer whether to deploy AI. It is how. This playbook is the answer.

Before you start

Readiness

Most AI deployments in financial services stall before they reach production. Not because the technology fails. Because one of four prerequisites was assumed rather than verified. Data infrastructure, team capacity, regulatory clarity, budget realism. Get these wrong and you will spend six months discovering what you should have confirmed in six days.

Data infrastructure comes first. The model needs data served at the latency the use case demands. A credit scoring model needs transaction history, bureau data, and application data in a single pipeline. An AML alerting system needs to score transactions against customer profiles within seconds. If your data lives in batch-fed warehouses that refresh overnight, real-time AI is not on the table until you fix the plumbing.

Team capacity is the second constraint. AI deployment requires joint ownership across technology and the business function it serves. A CTO who builds a model without a business sponsor will deliver a system nobody adopts. A COO who requests AI without a technology partner will receive a vendor pitch, not a working system. Both must be committed before you start.

Regulatory clarity is the third. The FCA and PRA will ask how your AI system makes decisions. Can you answer that question today, before you build? If you cannot explain the system to the regulator, you are not ready to deploy it.

Budget realism is the fourth. The vendor quotes a model development cost. That is 30 to 40 per cent of the total. Data engineering, regulatory validation, integration with existing workflows, and ongoing monitoring account for the rest. Budget for the full picture or discover the gaps later.

Ask these questions before you spend anything. Can you serve data at the latency the model needs? Do you have a joint owner across technology and the business? Can you explain the system to the FCA when they ask? Can you define the success metric today, before you build? Four yeses means go. Anything less means prepare first. Preparation is not delay. It is the cheapest phase of any deployment. The expensive phase is discovering a gap after the contracts are signed.

The investment

The business case

Every AI vendor will tell you their product "transforms" operations. The honest framing is more specific. AI reduces the cost of decisions currently made by humans at volume. Claims triage, customer due diligence, regulatory reporting, payment screening. These are processes where a trained model handles the routine majority and routes the exceptions to the people who should have been doing only that work all along.

Measure AI the way you measure any operational investment. Three metrics: cost avoided, capacity released, and error rates reduced. The first is straightforward. The second is where real value lives. It is not about cutting headcount. It is about redirecting expensive analyst time from manual triage to judgement work. The third is often the most compelling. AI systems that reduce false positive rates on fraud screening or AML alerts by 60 to 80 per cent are not just cheaper. They are better.

Typical timelines in regulated financial services: eight to twelve weeks from engagement to first production deployment, three to six months to full operational integration, twelve months to measurable ROI. These timelines are longer than a vendor demo suggests and shorter than a traditional IT programme delivers. The difference is scope discipline. One bottleneck, done properly, before moving to the next.

The hidden costs that vendor pricing does not cover deserve a line in the budget. Data engineering to make your data model-ready accounts for 40 to 60 per cent of total effort. Regulatory validation and model documentation take time that no vendor quotes. Integration with existing operational workflows requires people who understand both the system and the process. Ongoing monitoring is not optional. A model that drifts out of calibration fails silently. Budget for the full cost or budget twice.

For a deeper look at how the four C-suite roles approach the investment question, the enterprise AI guide covers strategy, operations, architecture, and returns in detail.

The people

The operating model

AI lands in operations or it lands nowhere. The CEO sets the direction. The CTO builds the infrastructure. But the COO owns the processes that AI actually changes. Claims handling, customer onboarding, regulatory submissions, payment processing. These are the COO's processes, the COO's people, the COO's SLAs.

Three deployment models exist. Forward-deployed engineers from an external partner embed inside your organisation, build the system alongside your team, and transfer knowledge as they go. Internal build means you hire the capability and own every line. Hybrid means an external partner builds the first system while training your team to build the second. Each has trade-offs. Forward-deployed is fastest but creates dependency. Internal build gives full control but takes twelve to eighteen months to staff. Hybrid is the most common path in regulated financial services because it balances speed with institutional capability.

Change management is the COO's deployment problem. When 70 per cent of claims start settling automatically, what happens to the handlers? They are not redundant. They redeploy to the complex 30 per cent that genuinely needs expertise. But that redeployment requires new skills, new workflows, and new performance metrics. If nobody plans for it, the team resists the system and the investment stalls.

Operational AI demands new monitoring. A manual process fails visibly: queues grow, SLAs breach, people complain. An automated process can fail silently. A model drifting out of calibration does not raise its hand. The COO needs model health built into the same operational dashboards the team already watches. Not a separate AI dashboard. Drift, accuracy, and exception rates alongside cycle time and SLA performance.

Process automation is the entry point for most COOs. It is the right starting point because the bottleneck is already measured, the team already understands the process, and success is already defined by the existing SLA.

The technology

The architecture

The first question is not which model to use. It is whether your data infrastructure can support AI in production. Most financial institutions discover, after the pilot succeeds and the business case is approved, that their data is not ready. Not because it does not exist, but because it lives in systems that were never designed to serve a model in real time.

Data pipelines are the foundation. An AI system that makes credit decisions needs transaction history, bureau data, and application data in a single pipeline with consistent latency. An AML alerting system needs to score transactions against customer profiles within seconds, not batch windows. If your architecture cannot serve data at the speed the model needs it, the model is irrelevant. Fix the pipes before you choose the engine.

Model selection follows a simple dividing line. Commodity capabilities (document extraction, translation, summarisation) are best served by API. Use a vendor. Document intelligence falls into this category for most organisations. Proprietary capabilities (underwriting models trained on your portfolio, credit scoring against your customer base, claims triage calibrated to your book) are best built in-house. The dividing line: does the model's value come from the capability itself, or from your data? If it is your data, it is your model.

Regulatory architecture is the constraint most technology leaders underestimate. The PRA's SS1/23 model risk management principles require that every AI-driven decision can be explained, audited, and rolled back. Full lineage from input data through model inference to output action. Model versioning. Challenger models. Ongoing monitoring for drift. Explainability is an architectural requirement, not a compliance add-on. Build the audit trail from day one. Retrofitting it costs three times as much and takes twice as long.

The FCA cares about outcomes. Can you demonstrate that your AI system treats customers fairly? Can you show that it does not discriminate? Can you prove that its decisions are consistent? These are not abstract questions. They require architectural decisions: logging, testing, and monitoring designed to answer them.

The controls

Governance and compliance

The PRA and FCA will ask how your AI system makes decisions. Not whether it makes good decisions (though they care about that too). How. The mechanism. The inputs. The weighting. The boundary conditions. If you cannot answer clearly, you have a compliance problem that no amount of model accuracy will fix.

Explainability is not a report you generate after the fact. It is a design constraint that shapes every architectural decision. Which features does the model use? Why those features and not others? How does each feature influence the output? When the model says "decline this application," can you show the applicant why in plain language? These questions must be answerable before you deploy, not after the regulator asks.

Audit trails run deeper than logging. Every decision the system makes needs a record: the input data at the time of inference, the model version that produced the output, the confidence score, and the action taken. When a customer complains or a regulator investigates, you need to reconstruct exactly what the system saw and exactly what it did. Months later. With full fidelity.

Model versioning and challenger models are not optional in regulated financial services. The PRA expects firms to run challenger models that test whether the primary model's outputs remain calibrated. When performance degrades, the challenger provides an alternative. Drift monitoring catches the degradation. Without it, a model trained on 2024 data quietly loses accuracy against 2026 behaviour and nobody notices until the losses appear.

The EU AI Act adds a further layer for firms operating in or serving European markets. High-risk AI systems in financial services (credit scoring, insurance pricing, fraud detection) require conformity assessments, transparency obligations, and human oversight mechanisms. The Act is not yet fully enforced, but building to its requirements now avoids costly retrofitting later.

Every decision must be traceable from input to output. Not in theory. In practice. When a customer disputes a lending decision eighteen months from now, your compliance team needs to reconstruct what the model saw, what it weighted, and what it recommended. That reconstruction must be exact, not approximate. This is what "governance" means in production: the ability to answer any question about any decision at any point in the future.

Regulatory reporting is itself a deployment target. The same governance framework that governs your AI models can accelerate the reporting process. Structured extraction, automated validation, audit-ready outputs. Governance is not only a constraint. It is a capability.

What comes next

Scaling

The first AI project proves the model works. The second proves the organisation can repeat it. That distinction matters more than anything the first project delivers. A single successful deployment is an experiment. Two successful deployments are a programme. The difference is institutional capability.

Scaling means building the organisation, not just the technology. The team that built the first project should train the team that builds the second. The data pipelines that served the first model should be designed to serve the next five. The governance framework that satisfied the regulator for one use case should generalise to the portfolio. Every decision in the first deployment is a precedent. Make it a good one.

Measure success against the metric you defined at the start. Not a new metric that flatters the outcome. Not a proxy that is easier to move. The original metric: cycle time, false positive rate, cost per decision, capacity released. If you defined it clearly in the readiness phase, measuring it now is straightforward. If you did not, you are measuring activity rather than outcomes.

The question of when to scale has a practical answer. Scale when the first deployment is in production, monitored, and meeting its target metric for at least one full business cycle. Not when the pilot looks promising. Not when the board is enthusiastic. When the system is running, the team is confident, and the numbers confirm the business case. Premature scaling is the most expensive mistake in enterprise AI. It multiplies the problems of the first project across the organisation before those problems are solved.

Building internal capability is the exit criterion for external partnerships. No permanent dependency on external teams. The partner builds the first system, trains your people, documents the approach, and leaves you with an organisation that can build the second system independently. If the partner's model depends on being permanently embedded, that is not a partnership. That is a staffing arrangement with a markup. The goal is institutional capability: the skills, processes, and confidence to deploy AI repeatedly without starting from scratch each time.

The path from one project to a programme runs through real deployments with measurable outcomes. Each one compounds institutional knowledge. Each one reduces the cost and risk of the next.

Last updated May 2026

If you are ready to deploy, there are fifteen minutes on the calendar.

Let’s build AI together