The map

Are people fed up with chatbots?

Yes, well sort of. Bad implementations hammer brand reputation, good implementations can be brand positive. What does this mean for DTC brands?


Justin Thompson6 min read

The decision to adopt AI in customer service is not as simple as budget or support volume. We’ll look at some of the failure modes we’ve seen across the industry and then see what the successful brands do differently. First some data.

Search backs it up. “Speak to a human” peaked at 77 in April 2026 from a baseline around 30. Every rebellion term in the cluster bends upward in August 2025. People are actively trying to bypass AI in customer service.

That’s the abstract question. The implementations they actually meet are the real one. The public record splits into high-profile failures and quieter deployments customers like. Reading both is the closest thing to predictive data a DTC CX leader has.

The big rollouts to learn from

Klarna. Announced in February 2024 that an OpenAI-built assistant was doing the work of 700 CS agents in its first month. Fourteen months later, Sebastian Siemiatkowski told Fortune that “cost was a too predominant evaluation factor” and Klarna was investing in humans again.

Air Canada. A customer asked the airline’s chatbot about bereavement fares. The chatbot invented a refund policy. When Air Canada refused to honor it, the customer took them to British Columbia’s Civil Resolution Tribunal, which ruled against the airline in February 2024.

Cursor. The AI-coding-tool company’s own support bot invented a one-device policy in May 2025. There was no such policy. Subscribers churned and Fortune ran the story.

Chipotle (notable mention). Not reputational damage, just funny. In March 2026 users discovered Chipotle’s support bot, “Pepper,” would solve LeetCode and write Python. Someone shipped an OpenAI-compatible proxy so the internet could use Chipotle’s compute for free coding help. The brand had shipped an LLM in production without scope guardrails, and the internet noticed within weeks.

Chipotle Pepper customer support bot solving a LeetCode coding problem

Chipotle’s “Pepper” support bot answering a coding question.

When the AI broke, customers didn’t blame the vendor. They blamed the brand. The Air Canada tribunal made it explicit: the chatbot’s misinformation was the airline’s.

What separates the failures from the wins

What separates a success from a viral failure isn’t budget or vendor. It’s whether the work needs human judgment.

Bank of America’s Erica. Live since 2018, past 2.5 billion interactions and 56 million users by August 2025. Erica handles banking actions customers were already doing in the app, just faster.

Lyft + Anthropic. Lyft published in 2025 that integrating Claude cut customer service resolution time by 87% on handled cases. Most are real-time dispatch problems humans couldn’t scale to.

Sephora. Shade matching, virtual try on, product discovery, across a deployment running nine years. Work no human was ever going to staff at that volume.

Quick low-stakes answers. Order status, store hours, return policy basics, password resets. Customers prefer the bot here because the alternative is waiting on hold for a human to read them a tracking number. No emotional load, no policy interpretation, no judgment call. The bot just has to be faster than the form.

The wins sit where humans weren’t going to do the work (banking self serve, dispatch, retail-scale browsing) or didn’t need to (status lookups, hours, policy basics). The losses are complex cases the chatbot was never going to be able to solve, leaving the customer feeling like they need to get through the bot to speak to a human.

What this means for DTC

The lesson isn’t avoid customer facing AI. It’s understand which side of the line your tickets sit on before you deploy.

Quick lookups, order status, tracking, store hours, returns policy basics, sit in the wins category. Deploy there, take the speed. The tickets customers escalate over, refund disputes, damaged orders, subscription cancellations, anything with emotional load, sit in the losing category. AI on those either fails publicly like Klarna did, or routes most of the work to a human anyway.

The deployment that quietly works does both. The bot handles the easy mix. Humans handle the hard one. Most brands skip the diagnostic.

What are you optimising for?

Deflection rate is what vendors lead with. It’s not customer experience. You can drive deflection up and CSAT down at the same time. Klarna did, then walked it back.

Targeted deployments with clear escape hatches to a human work. Blanket rollouts don’t.

Sources

Part of the AI in customer service: the map series

The AI-in-CX category is still being drawn. Deflection, assist, automation, copilot, agent. These words mean different things to different vendors, and the map of the category is contested. This pillar publishes our reading of the map, and where Handsom sits on it.

See the full series

See how this works in your stack

30 minutes. We walk through how Handsom would slot in alongside your existing helpdesk and what the first month looks like.