Bias in AI
Last reviewed April 2026
An AI model trained on historical lending data learns the patterns in that data, including the patterns of discrimination. If past loan officers approved fewer applications from certain ethnic groups, the model reproduces that behaviour and calls it "prediction." Bias in AI is not a bug introduced by the technology. It is historical inequality encoded in data and amplified by automation, and in financial services, the consequences land on the people least equipped to absorb them.
What is bias in AI?
Bias in AI refers to systematic errors in model outputs that produce unfair outcomes for specific groups, typically defined by protected characteristics such as race, gender, age, disability, or religion. In financial services, bias can emerge at every stage of the model lifecycle: in the data used for training (historical bias), in the features selected (proxy discrimination), in the model's learning process (algorithmic bias), and in how outputs are used (deployment bias). A credit scoring model that uses postcode as a feature may inadvertently discriminate on the basis of ethnicity because residential segregation creates correlation between the two.
The challenge is that bias is often invisible in standard model performance metrics. A model can achieve 95 per cent accuracy overall while performing significantly worse for a minority group. If the group is small relative to the total population, aggregate metrics mask the disparity. This is why fairness testing across protected characteristics is essential and why relying solely on accuracy metrics creates a false sense of safety.
Proxy discrimination is the most insidious form. Even when protected characteristics are excluded from the model's inputs, other features can serve as proxies. Occupation, education level, and spending patterns all correlate with protected characteristics. Removing the protected characteristic from the input data does not remove the bias; it just makes it harder to detect. Effective bias mitigation requires understanding these correlations and testing for their effects.
The landscape
The Equality Act 2010 prohibits direct and indirect discrimination in the provision of financial services. AI models are not exempt. If a model produces outcomes that disproportionately disadvantage a protected group and the firm cannot justify the disparity as a proportionate means of achieving a legitimate aim, the firm is liable. The FCA's Consumer Duty reinforces this: good customer outcomes must be delivered across the customer base, not just on average.
The EU AI Act requires that high-risk AI systems (which include credit scoring and insurance pricing) use training data that is representative and free from bias. Article 10 specifies that data governance practices must account for possible biases and that appropriate measures must be taken to mitigate them. This creates a documentation obligation: firms must demonstrate what bias testing they conducted and what mitigations they applied.
The ICO and the Equality and Human Rights Commission jointly published guidance on AI and equality, emphasising that automated decision-making systems must comply with equality law. The guidance clarifies that ignorance of bias is not a defence: organisations are expected to proactively test for discriminatory effects and take steps to prevent them.
How AI changes this
Bias detection tools evaluate model outputs across protected characteristics, computing metrics such as demographic parity (equal approval rates), equalised odds (equal true positive and false positive rates), and calibration (equal accuracy of predictions). These tools reveal disparities that aggregate metrics hide. For a lending model, this might show that the false positive rate (flagging good borrowers as bad) is 15 per cent for one ethnic group and 8 per cent for another, an operationally significant disparity that overall accuracy would obscure.
Pre-processing techniques modify training data to reduce bias before model training. Re-sampling underrepresented groups, re-weighting training examples, and removing or transforming proxy features can reduce the bias the model inherits from historical data. These techniques are effective but involve trade-offs: re-sampling can reduce model performance on the majority group, and removing features can reduce overall accuracy.
In-processing methods incorporate fairness constraints directly into the model's learning objective. Rather than optimising purely for accuracy, the model optimises for accuracy subject to a fairness constraint, for example, that the approval rate must not differ by more than a defined threshold across groups. This produces models that are fair by construction, though the accuracy cost depends on how much bias exists in the underlying data.
Post-processing adjustments modify model outputs to achieve fairness targets. Decision thresholds can be calibrated differently for different groups to equalise outcomes. This approach is technically simple but raises questions about transparency: applying different thresholds to different groups is itself a form of differential treatment that must be justified and documented.
What to know before you start
Choose your fairness metric before you build the model, and justify the choice. Different metrics encode different values, and they can conflict. Demographic parity (equal outcomes) and equalised odds (equal error rates) cannot both be satisfied when base rates differ between groups. The choice is a policy decision that senior leadership should make, documented, with legal input. Do not leave it to the data science team.
Testing for bias requires data about protected characteristics, which many institutions do not collect. You cannot test what you cannot measure. The FCA has indicated that firms should consider collecting diversity data specifically for the purpose of fairness testing, subject to data protection requirements. If you cannot collect this data directly, proxy methods using aggregate statistics and geographically linked demographic data can provide partial coverage.
Bias mitigation is not a one-time exercise. Models drift, populations change, and new forms of proxy discrimination emerge. Build continuous fairness monitoring into your model risk management framework, with defined thresholds that trigger review and remediation. Quarterly fairness assessments are a reasonable starting cadence for high-risk models.
Start with the models that have the highest impact on protected groups: credit decisions, insurance pricing, and fraud detection (where false positives can result in account freezes that disproportionately affect specific populations). Conduct a bias audit of these existing models before deploying new ones. The audit will likely reveal issues in models that predate your bias testing capability, and addressing these legacy issues builds both capability and credibility with the regulator.
Last updated
Exploring AI for your organisation? There are fifteen minutes on the calendar.
Let’s build AI together