Enterprise AI for Credit

Scoring

Alternative data and credit scoring

Traditional credit scoring works well for borrowers who already have credit. That is the problem. In the UK, roughly 5.8 million adults are credit invisible or thin-file. Bureau data alone tells you nothing about them. The lender either declines the application or prices the risk so conservatively that the product stops being useful.

Alternative data changes the calculus. Open banking transaction data, rent payment history, utility records, and even HMRC income verification through the FCA-regulated open banking ecosystem give AI models a richer picture of repayment capacity. The models are not replacing bureau scores. They are supplementing them for the segments where bureau data is thin or absent.

The technical challenge is feature engineering at speed. A credit scoring model consuming open banking data needs to derive stable features (income regularity, discretionary spending patterns, gambling frequency) from raw transaction feeds in near real time. That requires a feature store, consistent data pipelines, and agreement with your data science team on which features are both predictive and explainable.

Explainability is not optional. The FCA's Consumer Duty requires that consumers understand why they were declined or offered specific terms. The EU AI Act classifies credit scoring as high risk, which means mandatory conformity assessments, human oversight, and documented bias testing. If your model uses 400 features and you cannot explain which ones drove the decision, you have a regulatory problem before you have a business one.

The practical approach: use interpretable models (logistic regression, gradient-boosted trees with SHAP values) for the primary scorecard. Reserve deep learning for second-look populations where the interpretable model declines but alternative data suggests creditworthiness. Document the interaction between models. The regulator will ask.

Affordability

Affordability assessment

Affordability is distinct from creditworthiness. A borrower may have a strong credit history but lack the disposable income to service a new obligation. The FCA has been clear on this since PS19/11, and Consumer Duty sharpened the expectation further. Lenders must verify that the product will not cause foreseeable harm. That means real income verification, not self-declaration.

Open banking makes affordability assessment faster and more accurate. Instead of asking the applicant to self-declare income and outgoings (which they estimate poorly and sometimes misrepresent), transaction-level data gives you a verified picture. AI models categorise transactions into income, committed expenditure, discretionary spending, and debt servicing. The categorisation is where the value sits, and where the difficulty lives.

Transaction categorisation is harder than it sounds. A payment to "AMZN" could be household essentials or discretionary. A standing order could be rent or a savings transfer. The model needs merchant-level enrichment, pattern recognition across time series, and rules to handle ambiguity. Most off-the-shelf categorisation engines get 70-80% accuracy. For affordability decisions that carry regulatory weight, you need 95%+ on the categories that matter (income, rent, debt servicing). That gap is your build-versus-buy decision.

Predictive analytics adds a forward-looking dimension. Historical affordability tells you what the borrower could service last month. Predictive affordability models income stability, employment sector risk, and expenditure trends to estimate capacity over the loan term. This is where AI earns its place: the static affordability snapshot becomes a dynamic forecast.

The ICO requires a lawful basis for processing open banking data, and explicit consent alone may not suffice if the data is being used for purposes beyond the original consent scope. Map your data flows before you build the model. Data governance is the prerequisite, not the afterthought.

Collections

Collections and recoveries

Collections is where AI delivers the most measurable return in consumer credit. The traditional approach is a static segmentation: days past due determine the contact strategy. 30 days triggers a letter. 60 days triggers a call. 90 days triggers a referral to a debt collection agency. This approach treats all arrears equally. They are not.

AI-driven collections models segment by likelihood of self-cure, propensity to pay, and optimal contact channel. A borrower who missed one payment because of a timing mismatch between salary and direct debit dates is fundamentally different from one in sustained financial difficulty. The first needs a gentle nudge (an SMS, a payment date adjustment). The second needs forbearance and a conversation. Treating both the same is inefficient and, under Consumer Duty, potentially harmful.

The data inputs are richer than most collections teams realise. Payment history, open banking data (where consented), bureau data refreshes, and behavioural signals (app logins, partial payments, contact attempts) feed a model that predicts the right action at the right time through the right channel. Process automation then executes the strategy: adjusting payment dates, sending tailored communications, escalating to human agents only when the model predicts that automation will not resolve the case.

The results are specific. Lenders deploying AI in collections typically see 15-25% improvement in cure rates within the first 90 days of arrears, a 30-40% reduction in outbound call volumes, and meaningful improvement in customer outcomes (fewer complaints, fewer instances of collections activity deepening financial difficulty). These are the numbers that justify the investment.

Vulnerability detection is the hardest part. The FCA expects lenders to identify customers in vulnerable circumstances and respond appropriately. AI models that flag vulnerability indicators (erratic transaction patterns, benefit income dependency, high-cost credit usage) must do so with precision. False negatives cause harm. False positives waste specialist resource. The threshold calibration is a business decision with regulatory consequences. Involve your compliance team from day one.

Key Decisions

Static segmentation or dynamic?

Dynamic. Days-past-due segmentation ignores self-cure probability, contact preferences, and vulnerability indicators. AI models that update daily produce better outcomes and lower operational cost.

Where does automation stop?

Automate routine arrears (payment date adjustments, reminders, payment plans for willing payers). Escalate to human agents for vulnerability cases, disputed debts, and accounts where the model has low confidence. Define the boundary before you deploy.

How do you detect vulnerability?

Transaction pattern analysis (benefit income, gambling, high-cost credit) combined with behavioural signals (repeated failed contact, partial payments). Calibrate the threshold with your compliance team. This is not a pure data science decision.

Fraud

Fraud detection at origination

Origination fraud costs UK lenders hundreds of millions per year. First-party fraud (applicants misrepresenting identity or income) and third-party fraud (stolen identities, synthetic identities) both concentrate at the point of application. The traditional defence is a combination of identity verification, bureau checks, and manual review. AI shifts the balance from reactive investigation to real-time prevention.

The most effective origination fraud models operate across multiple signals simultaneously. Device fingerprinting, behavioural biometrics (typing cadence, form completion patterns), identity document verification, and application data consistency checks feed a single risk score. No individual signal is sufficient. A stolen identity will pass an ID check. A synthetic identity will pass a bureau check. The combination of signals catches what individual checks miss.

Fraud detection at origination must balance precision against friction. Every additional check adds time to the application journey. Every false positive declines a legitimate customer. The conversion cost of false positives is real and measurable: a 1% false positive rate on 100,000 monthly applications is 1,000 lost customers. Set the threshold with the commercial team, not just the fraud team.

Authorised push payment (APP) fraud is a growing concern for lenders offering disbursement to third-party accounts. The model needs to assess not just whether the applicant is genuine, but whether the disbursement account is controlled by the applicant. Open banking account ownership verification, combined with payee risk scoring, addresses this vector. The Payment Systems Regulator's mandatory reimbursement rules make APP fraud prevention a financial priority, not just a compliance one.

Consortium data is a force multiplier. Fraud patterns that are invisible in a single lender's data become obvious across a network. Industry utilities like Cifas and the National Hunter system provide shared intelligence, but the integration is rarely seamless. Build your model to consume consortium signals as features, not as binary pass/fail gates. The value is in the combination with your proprietary data. Predictive analytics applied to consortium patterns reveals emerging fraud typologies before they hit your book.

Key Decisions

Real-time or batch scoring?

Real-time for the primary fraud score at application. Batch for network analysis and pattern detection across the portfolio. Both feed the same decisioning engine, but at different latencies.

How do you handle false positives?

Route medium-risk applications to a fast-track manual review queue rather than declining outright. Measure the conversion cost of false positives alongside the fraud loss prevented. The optimal threshold is a commercial decision.

Consortium data or proprietary only?

Both. Consortium data (Cifas, National Hunter) catches known fraud patterns. Proprietary behavioural signals catch novel ones. Integrate consortium signals as model features, not standalone gates.