Conversational AI

Last reviewed April 2026

The difference between a scripted chatbot and conversational AI is the difference between a vending machine and a branch adviser. One follows a fixed menu. The other listens, interprets, and responds in context. For financial services, the distinction matters because the complexity of customer needs, from disputed transactions to bereavement notifications, demands more than pattern matching against a list of intents.

What is conversational AI?

Conversational AI is the umbrella term for systems that conduct human-like dialogue using natural language understanding, dialogue management, and natural language generation. It encompasses chatbots, voice assistants, and agent-assist tools, but the defining characteristic is the ability to handle multi-turn, context-aware conversations rather than single question-and-answer exchanges.

In financial services, conversational AI operates across three layers. The understanding layer parses what the customer means, not just what they said. "I think someone's used my card" could be fraud reporting, a disputed transaction, or a query about an unrecognised merchant. The dialogue layer manages the conversation flow, asking clarifying questions, maintaining context across turns, and deciding when to act versus when to ask. The generation layer produces responses that are natural, accurate, and compliant with the firm's communication standards.

The operational value is in the second layer: dialogue management. Understanding a single sentence is a solved problem. Managing a ten-turn conversation where the customer changes topic, provides incomplete information, and expresses frustration is where most systems break down. Financial services conversations are particularly complex because they often involve authentication, regulatory disclosures, and actions that affect the customer's money.

The landscape

Large language models have compressed the timeline from concept to prototype from months to days. A developer can build a conversational interface over a bank's knowledge base in a weekend. The gap between prototype and production, however, remains measured in months. The prototype does not handle authentication. It does not integrate with core banking. It does not comply with the FCA's Consumer Duty. It does not log interactions for regulatory record-keeping. These are not edge cases. They are the requirements.

Voice is resurging. After a decade of text-first conversational AI, voice interfaces are returning to prominence. The quality of speech-to-text has improved to the point where voice-based AI can handle accents, background noise, and natural speech patterns reliably. For financial services, voice matters because many customers, particularly older demographics and those with accessibility needs, prefer to speak rather than type. The Equality Act 2010 requires firms to make reasonable adjustments, and voice AI is increasingly part of that obligation.

The enterprise market is consolidating. Between 2019 and 2023, dozens of conversational AI vendors entered the financial services market. Many have been acquired, pivoted, or closed. The survivors tend to be either large platform providers (Google, Microsoft, Amazon) with broad capabilities or niche specialists with deep financial services domain knowledge. For buyers, this consolidation reduces choice but increases the maturity of remaining options.

How AI changes this

Context persistence across channels is the capability that matters most operationally. A customer who starts a conversation on the mobile app, drops off, and calls the contact centre should not restart from zero. Modern conversational AI maintains a session state that follows the customer, enabling the voice agent to say "I can see you were asking about your mortgage payment on the app. Let me pick up where we left off." This requires integration with contact centre AI infrastructure, but the technology exists.

Proactive conversation is emerging. Rather than waiting for the customer to initiate contact, AI systems can reach out when they detect a trigger: a missed payment, an expiring fixed-rate mortgage, a direct debit that is about to bounce. Predictive models identify these triggers and feed them into the conversational system. The conversation is initiated by the system but conducted in natural language, allowing the customer to respond, ask questions, and take action within the same interaction. The regulatory guardrails around proactive outreach (marketing permissions, time-of-day restrictions, opt-out mechanisms) must be built into the system.

Emotion and intent co-detection gives the system richer signal. A customer who says "I need to close my account" in a calm, matter-of-fact tone is likely making a routine request. The same words spoken with agitation and preceded by a complaint may indicate a retention opportunity or a vulnerability flag. Multimodal analysis of text, voice tone, and interaction patterns enables responses calibrated to the customer's state, not just their words.

What to know before you start

Conversation design is a discipline, not a feature. Hire or contract conversation designers who understand financial services. The flow of a bereavement notification conversation is fundamentally different from a balance enquiry, and getting it wrong has real consequences for vulnerable customers. Template-driven approaches that treat every conversation the same way will fail on the interactions that matter most.

Regulatory record-keeping applies to AI conversations just as it does to human ones. Every interaction must be logged, searchable, and retainable for the required period. The format must support audit: a regulator reviewing a complaint needs to see exactly what the AI said, what the customer said, and what actions were taken. Build the audit trail before building the conversation.

Latency is the silent killer of voice AI. A human expects a response within 300 to 500 milliseconds in natural conversation. A system that takes two seconds to respond feels broken, regardless of how accurate the answer is. If your architecture routes queries through multiple services (authentication, knowledge retrieval, compliance checking, response generation), measure the end-to-end latency under load, not just in a demo environment.

Start with the conversations you already understand well. Take your top ten call reasons, build conversational AI for the three simplest, and measure resolution rate against the human baseline. Expand only when resolution rates match or exceed human performance. The institutions that tried to launch with full coverage on day one are the ones whose chatbots earned a reputation for being useless.

Last updated

Exploring AI for your organisation? There are fifteen minutes on the calendar.

Let’s build AI together
← Back to AI Glossary