Entity Resolution

Last reviewed April 2026

The same customer appears in your core banking system as "J. Smith Ltd", in your CDD platform as "John Smith Limited", and in your transaction monitoring system as "JS Ltd". Three records. One entity. Zero connection between them. Entity resolution is the discipline of determining that these records refer to the same real-world entity, and without it, every downstream compliance process is working with an incomplete picture.

What is entity resolution?

Entity resolution is the process of identifying when different records across one or more datasets refer to the same real-world entity, whether that entity is a person, a company, or an account. In financial services, it is the foundational data problem that sits beneath KYC, sanctions screening, transaction monitoring, and fraud detection. If you cannot reliably determine that two records represent the same customer, you cannot reliably screen that customer, monitor their transactions, or assess their risk.

The problem is harder than it appears. Names vary: abbreviations, transliterations, maiden names, trading names, and data entry errors all create variations. Addresses change. Companies restructure. Individuals hold multiple roles across multiple entities. A person can appear in your systems under their legal name, a nickname, a transliterated version of their name from a non-Latin script, and a misspelling introduced during data entry. Deterministic matching (exact field comparison) catches only the easy cases. The hard cases require probabilistic methods that weigh multiple fields and assess overall likelihood.

The cost of getting entity resolution wrong flows in two directions. False negatives (failing to link records that belong to the same entity) mean fragmented customer views, missed sanctions matches, and AML monitoring that cannot see the full picture. False positives (incorrectly linking records from different entities) corrupt customer data, trigger false sanctions alerts, and can lead to incorrect risk assessments. Both types of error are operationally expensive and regulatorily risky.

The landscape

Master data management programmes in financial services have been trying to solve entity resolution for decades. The results have been mixed. Many institutions have invested in MDM platforms that maintain a "golden record" for each customer, but the quality of these records degrades over time as new data enters through channels that bypass the MDM process. A customer onboarded through a digital channel may not pass through the same data quality checks as one onboarded through a branch.

The Legal Entity Identifier (LEI) system, managed by the Global Legal Entity Identifier Foundation (GLEIF), provides a standardised identifier for legal entities participating in financial transactions. LEI adoption simplifies entity resolution for entities that have one, but coverage is far from universal. Many smaller companies, private individuals, and entities in jurisdictions with low LEI adoption rates do not have an LEI, and the system does not cover individuals at all.

Regulatory expectations on data quality implicitly require effective entity resolution. The FCA's expectations on KYC and ongoing monitoring assume that the institution has a complete view of each customer relationship. If the same customer exists as multiple unlinked records, the institution's monitoring is by definition incomplete. Supervisory examinations increasingly probe data quality foundations, and entity resolution is a frequent finding.

How AI changes this

Machine learning-based entity resolution uses multiple signals simultaneously: name similarity, address proximity, date of birth, shared identifiers, transaction patterns, and network connections. Rather than applying fixed matching rules with predefined thresholds, ML models learn from confirmed matches and non-matches in the institution's own data. This produces match scores that are more accurate and more adaptable than rule-based approaches, particularly for the ambiguous cases that sit in the middle ground between obvious matches and obvious non-matches.

Graph-based entity resolution maps relationships between records and identifies clusters that likely represent the same entity. Two records may not match directly but may be connected through shared attributes: the same phone number, the same IP address used for online banking, the same beneficial owner, or transactions between them that suggest a common controller. Graph methods catch linkages that pairwise comparison misses.

Real-time resolution at the point of data entry prevents duplicates from being created. When a new customer record is created or a transaction is processed, the entity resolution system checks for existing matches in real time and either links to the existing record or flags a potential duplicate for review. This is more effective than periodic batch resolution, which allows duplicates to accumulate and propagate into downstream systems before being caught.

Cross-system resolution connects records across internal systems and external data sources. A customer record in the core banking system is linked to their record in the CDD platform, the transaction monitoring system, and external databases like corporate registries and sanctions lists. This unified view is the prerequisite for effective financial crime detection: you cannot monitor a customer's transactions comprehensively if those transactions are spread across unlinked records in multiple systems.

What to know before you start

Entity resolution is a data quality project, not an AI project. The AI component is the matching algorithm. The hard work is understanding your data: which systems contain customer records, what fields are populated, how reliable each field is, and where the known gaps are. Map your data landscape before selecting a matching approach.

Human review workflows are essential for ambiguous matches. The system will produce cases where the match probability is neither high enough to auto-link nor low enough to auto-dismiss. These cases require human judgement, and the volume of ambiguous cases determines the ongoing operational cost. Design the review interface carefully: present the evidence for and against the match, make the decision easy to record, and capture the decision as training data for model improvement.

Start with your highest-priority compliance use case. If your sanctions screening is generating excessive false positives because the same customer appears under multiple name variants, entity resolution that consolidates those variants into a single customer view will directly reduce false positive volumes. If your transaction monitoring is missing suspicious patterns because a customer's activity is split across unlinked accounts, entity resolution directly improves detection. Choose the use case where the impact is most measurable.

Do not underestimate the change management. Entity resolution will reveal that records previously thought to be separate customers are actually the same entity. This has implications for customer risk ratings, regulatory reporting, and potentially for product limits and exposure calculations. The downstream systems that consume customer data must be prepared for the consolidation. Engage compliance, risk, and operations teams before you start merging records.

Last updated June 2026

Exploring AI for your organisation? There are fifteen minutes on the calendar.

Let’s build AI together

← Back to AI Glossary