Traceability

Last reviewed April 2026

A regulatory report contains a number. The regulator asks where it came from. The answer involves three source systems, two transformations, a model, and a manual adjustment. If the firm cannot trace every step in that chain, the number is an assertion, not a fact. Traceability is the ability to follow the path from any AI output back to its source data and forward to its downstream impact, and in financial services, it is how firms demonstrate that their numbers mean what they say they mean.

What is traceability?

Traceability in the context of AI systems is the ability to document and follow the complete chain from source data through processing, model inference, and post-processing to the final output and its downstream uses. It answers three questions: where did the data come from (data lineage), what happened to it along the way (processing lineage), and where did the output go (impact lineage). Together, these form a complete provenance record for every AI output.

Data lineage traces each input feature back to its source system, including any transformations, aggregations, or enrichments applied along the way. Processing lineage records which model version processed the data, with what parameters, and what output was produced. Impact lineage tracks where the output was used: which decisions it informed, which reports it fed, and which downstream systems consumed it. A fully traceable AI system provides end-to-end visibility from source data to business outcome.

Traceability differs from auditability in scope. Auditability focuses on individual decisions: can you explain why this customer received this outcome? Traceability focuses on the data and processing chain: can you trace this output back to its source and forward to its impact? Both are necessary. A system can be auditable (you can explain individual decisions) but not fully traceable (you cannot trace the data lineage for every input feature).

The landscape

The PRA's expectations on data aggregation and risk reporting, building on BCBS 239, require firms to demonstrate the lineage of data used in regulatory reports. For AI systems that feed into regulatory reporting, this means every input must be traceable to a source system, and every transformation must be documented. The PRA tests this by asking firms to explain how specific reported figures were derived, expecting the firm to trace the complete chain on demand.

The EU AI Act's technical documentation requirements for high-risk systems include detailed descriptions of data processing, model architecture, and testing procedures. Traceability supports these requirements by providing the underlying evidence: not just what the system does in theory, but how specific outputs were actually produced from specific data.

The FCA's focus on outcomes monitoring under the Consumer Duty requires firms to trace the connection between AI system outputs and customer outcomes. If a pricing model produces a quote, the firm should be able to trace that quote back to the model, the data, and the business rules that produced it, and forward to the customer outcome (whether the customer bought the product, made a claim, and whether the outcome was fair).

How AI changes this

Automated data lineage tools track the flow of data through the AI pipeline, from source ingestion through feature engineering, model training, inference, and output delivery. These tools capture lineage metadata automatically, reducing the reliance on manual documentation that becomes stale as systems evolve. For complex AI pipelines with multiple data sources and transformation steps, automated lineage is the only scalable approach.

Feature stores provide a controlled layer between source data and model inputs. A feature store records how each feature is computed from source data, ensuring that the same feature definition is used consistently across training and inference. This standardisation simplifies traceability: rather than tracing each model's bespoke feature engineering, the auditor can trace through the feature store's documented transformations.

Model versioning systems maintain a complete history of every model version, including the training data, hyperparameters, and evaluation metrics. When a specific decision is audited, the system can identify which model version was active at the time and retrieve its full specification. Combined with input data logging, this enables complete processing lineage for any historical decision.

Impact analysis tools trace the downstream effects of changes. If a source data system changes its schema, which features are affected? Which models use those features? Which decisions and reports are affected? This forward traceability enables proactive risk management: identifying the impact of an upstream change before it propagates into model outputs and downstream decisions.

What to know before you start

Traceability is an architecture decision, not a documentation exercise. Documenting lineage in a spreadsheet or wiki is not scalable, not verifiable, and not maintainable. Invest in automated lineage tooling that captures metadata as data flows through the pipeline. The investment pays for itself when the regulator asks a lineage question and the answer takes minutes rather than weeks.

Start with the outputs that matter most: regulatory reports, customer-facing decisions, and financial calculations. Trace the lineage from these outputs back to source systems. The exercise will reveal gaps in your data pipeline where lineage is broken: manual steps, undocumented transformations, or data passed through systems that do not capture metadata. These gaps are the priority for remediation.

Feature engineering is where traceability often breaks. A feature that combines data from three source systems through a series of joins, aggregations, and transformations can be difficult to trace unless the feature engineering pipeline is instrumented. Feature stores address this for the feature layer. For pre-feature data processing, ensure that your ETL/ELT pipelines capture lineage metadata.

Define traceability requirements by use case. A model used for regulatory capital requires complete, verifiable lineage from source to report. An internal recommender system may require lighter traceability. Proportionate requirements ensure that traceability investment is directed where it creates the most value and reduces the most risk.

Last updated

Exploring AI for your organisation? There are fifteen minutes on the calendar.

Let’s build AI together
← Back to AI Glossary