Data Governance

Last reviewed April 2026

Every financial institution has a data governance programme. Most have had one for years. Yet the majority still struggle to answer basic questions: where does this data come from, who changed it last, and is it accurate? Data governance programmes that are disconnected from business outcomes consistently fail to sustain investment, and no amount of AI can compensate for data that nobody trusts.

What is data governance?

Data governance is the framework of policies, processes, roles, and technologies that ensures data across an organisation is accurate, accessible, consistent, secure, and used in compliance with regulations. It covers data quality, data lineage, data cataloguing, access controls, retention policies, and the organisational structures, data owners, data stewards, that make these things operational rather than aspirational.

The challenge is not defining the framework. Most financial institutions have a data governance policy. The challenge is making it stick. Data governance that exists only as documentation, without measurable outcomes, enforcement mechanisms, or visible executive sponsorship, degrades into a compliance checkbox. Data stewards are nominated but never empowered. Data quality metrics are defined but never acted upon. The programme consumes resources without delivering the data reliability that downstream functions, including AI, require.

Data quality at the point of capture is the principle that distinguishes effective data governance from expensive bureaucracy. If incorrect data enters the system, every downstream process inherits the error: risk models produce inaccurate outputs, regulatory reports contain incorrect figures, and AI systems learn from flawed training data. Fixing data downstream is orders of magnitude more expensive than preventing errors at the source. This is a process design problem, not a technology problem.

The landscape

The EU AI Act's requirements for data quality in high-risk AI systems create a direct regulatory link between data governance and AI deployment. Article 10 requires that training, validation, and testing datasets meet quality criteria including completeness, representativeness, and freedom from errors and bias. For financial institutions deploying AI in credit scoring, fraud detection, or risk assessment, this means data governance is no longer a supporting function; it is a regulatory prerequisite for AI.

The PRA's expectations on data aggregation and risk reporting (building on BCBS 239) continue to drive investment in data lineage and data quality for banks. Supervisors want to understand how reported figures are derived from source data, and they want to see that the institution can produce accurate risk reports rapidly, including under stress conditions. This requires automated data lineage that traces every reported number back to its source, which requires governance infrastructure.

Regulatory reporting is the function where poor data governance becomes most visibly expensive. A reporting error that requires restatement is not just a correction; it is a supervisory event that consumes senior management time, damages the institution's relationship with the regulator, and can trigger broader data quality investigations. The cost of poor data governance is concentrated and measurable in regulatory reporting.

How AI changes this

Automated data quality monitoring replaces the manual data quality reviews that most institutions run on a periodic, often quarterly, basis. AI systems continuously monitor key data attributes for anomalies: sudden changes in distribution, missing values, inconsistencies between related datasets, and drift from expected patterns. When an anomaly is detected, the system alerts the data steward before the bad data propagates into downstream processes.

Data cataloguing and discovery, traditionally a manual exercise of interviewing data owners and documenting datasets, can be substantially automated. AI systems scan databases, analyse column names, data types, and actual values to infer what each dataset contains, who uses it, and how it relates to other datasets. The output is a machine-maintained catalogue that stays current as the data landscape changes, rather than a manually maintained catalogue that is out of date within months of publication.

Automated data lineage traces the flow of data from source to consumption, identifying every transformation, aggregation, and enrichment along the way. For regulatory reporting, this means being able to show the regulator exactly how a reported figure was derived from source data, with every step documented. This capability is technically complex but regulatorily essential for institutions deploying AI in decision-making or reporting roles.

The connection to predictive analytics is foundational. Predictive models inherit the quality of their training data. A predictive analytics programme built on poorly governed data produces predictions that nobody trusts, which means nobody acts on them, which means the programme fails. Data governance is the enabling investment that makes predictive analytics viable.

What to know before you start

Tie every data governance initiative to a business outcome. "Improve data quality" is not a business outcome. "Reduce regulatory reporting restatements by 80 per cent" is. "Enable AI-based credit scoring by ensuring training data meets EU AI Act requirements" is. When the business outcome is clear, the investment case is defensible and the programme sustains executive sponsorship. When the outcome is vague, the programme quietly loses funding at the next budget review.

Data stewardship must be embedded in business teams, not centralised in a governance function. The people who understand the data best are the people who create and use it. Central governance defines the framework, sets the standards, and provides the tooling. Business data stewards enforce the standards within their domain. This distributed model is harder to implement but more effective than a central team that has authority but not context.

Master data management (MDM) for customer, counterparty, and product data is the foundational data governance investment for financial services. If the same customer appears under three different names in three different systems, no AI system can reliably link their activity. MDM is not glamorous, but it is the prerequisite for every cross-functional AI application, from KYC to risk aggregation to customer analytics.

Start with a data quality assessment of the datasets that feed your highest-priority AI use case. Measure completeness, accuracy, consistency, and timeliness against defined thresholds. The gap between current quality and required quality defines your data governance work programme. Do this before building the AI model, not after it fails in production due to data issues. Data readiness is the first question in our enterprise AI guide — because it is the first question that determines whether a programme succeeds.

Last updated

Exploring AI for your organisation? There are fifteen minutes on the calendar.

Let’s build AI together
← Back to AI Glossary