Are people fed up with chatbots?
Yes, well sort of. Bad implementations hammer brand reputation, good implementations can be brand positive. What does this mean for DTC brands?
The decision to adopt AI in customer service is not as simple as budget or support volume. We’ll look at some of the failure modes we’ve seen across the industry and then see what the successful brands do differently. First some data.
Search backs it up. “Speak to a human” peaked at 77 in April 2026 from a baseline around 30. Every rebellion term in the cluster bends upward in August 2025. People are actively trying to bypass AI in customer service.
That’s the abstract question. The implementations they actually meet are the real one. The public record splits into high-profile failures and quieter deployments customers like. Reading both is the closest thing to predictive data a DTC CX leader has.
The big rollouts to learn from
Klarna. Announced in February 2024 that an OpenAI-built assistant was doing the work of 700 CS agents in its first month. Fourteen months later, Sebastian Siemiatkowski told Fortune that “cost was a too predominant evaluation factor” and Klarna was investing in humans again.
Air Canada. A customer asked the airline’s chatbot about bereavement fares. The chatbot invented a refund policy. When Air Canada refused to honor it, the customer took them to British Columbia’s Civil Resolution Tribunal, which ruled against the airline in February 2024.
Cursor. The AI-coding-tool company’s own support bot invented a one-device policy in May 2025. There was no such policy. Subscribers churned and Fortune ran the story.
Chipotle (notable mention). Not reputational damage, just funny. In March 2026 users discovered Chipotle’s support bot, “Pepper,” would solve LeetCode and write Python. Someone shipped an OpenAI-compatible proxy so the internet could use Chipotle’s compute for free coding help. The brand had shipped an LLM in production without scope guardrails, and the internet noticed within weeks.

Chipotle’s “Pepper” support bot answering a coding question.
When the AI broke, customers didn’t blame the vendor. They blamed the brand. The Air Canada tribunal made it explicit: the chatbot’s misinformation was the airline’s.
What separates the failures from the wins
What separates a success from a viral failure isn’t budget or vendor. It’s whether the work needs human judgment.
Bank of America’s Erica. Live since 2018, past 2.5 billion interactions and 56 million users by August 2025. Erica handles banking actions customers were already doing in the app, just faster.
Lyft + Anthropic. Lyft published in 2025 that integrating Claude cut customer service resolution time by 87% on handled cases. Most are real-time dispatch problems humans couldn’t scale to.
Sephora. Shade matching, virtual try on, product discovery, across a deployment running nine years. Work no human was ever going to staff at that volume.
Quick low-stakes answers. Order status, store hours, return policy basics, password resets. Customers prefer the bot here because the alternative is waiting on hold for a human to read them a tracking number. No emotional load, no policy interpretation, no judgment call. The bot just has to be faster than the form.
The wins sit where humans weren’t going to do the work (banking self serve, dispatch, retail-scale browsing) or didn’t need to (status lookups, hours, policy basics). The losses are complex cases the chatbot was never going to be able to solve, leaving the customer feeling like they need to get through the bot to speak to a human.
What this means for DTC
The lesson isn’t avoid customer facing AI. It’s understand which side of the line your tickets sit on before you deploy.
Quick lookups, order status, tracking, store hours, returns policy basics, sit in the wins category. Deploy there, take the speed. The tickets customers escalate over, refund disputes, damaged orders, subscription cancellations, anything with emotional load, sit in the losing category. AI on those either fails publicly like Klarna did, or routes most of the work to a human anyway.
The deployment that quietly works does both. The bot handles the easy mix. Humans handle the hard one. Most brands skip the diagnostic.
What are you optimising for?
Deflection rate is what vendors lead with. It’s not customer experience. You can drive deflection up and CSAT down at the same time. Klarna did, then walked it back.
Targeted deployments with clear escape hatches to a human work. Blanket rollouts don’t.
Sources
- Klarna press release, February 27, 2024. Initial announcement: AI assistant handling two-thirds of CS chats, work of 700 FTEs in first month.
- Fortune, May 9, 2025. Sebastian Siemiatkowski walk-back: “cost was a too predominant evaluation factor,” Klarna investing in human support again.
- CBC, February 2024. Air Canada chatbot ruling: BC Civil Resolution Tribunal held airline liable for chatbot’s invented refund policy.
- Fortune, May 2025. Cursor AI support bot inventing one-device policy, subscriber churn.
- cyberpapiii/chipotlai-max on GitHub. March 2026 Chipotle “Pepper” support bot used as general LLM, OpenAI-compatible proxy shipped.
- The Register, May 2026. 74% of firms have rolled back at least one customer facing AI deployment in the last 12 months.
- Bank of America newsroom, August 2025. Erica surpassed 2.5B interactions, 56M active users, seven years live.
- Anthropic, 2025. Lyft + Claude integration cut average customer service resolution time by 87% on handled cases.
- Sephora newsroom. Nine-year chatbot deployment, expansion to ChatGPT app.
- Gorgias 2026 State of Conversational Commerce. 16,000 brands, 350M conversations. 86% of AI conversations eventually involve a human.
- UJET 2026 Agentic Experience Orchestration white paper. 85% of consumers prefer human agents over AI (Metrigy).
- Google Trends data (June 2026).
speak to a humanpeak 77 in April 2026,live customer supportsustained 84-89 since December 2025, all rebellion terms inflect August 2025.
The AI-in-CX category is still being drawn. Deflection, assist, automation, copilot, agent. These words mean different things to different vendors, and the map of the category is contested. This pillar publishes our reading of the map, and where Handsom sits on it.
See the full seriesSee how this works in your stack
30 minutes. We walk through how Handsom would slot in alongside your existing helpdesk and what the first month looks like.