The copilot lie: why 95% of companies aren't seeing results

· 6 min read · Read in Español
Share:

Why AI assistants aren’t delivering on their promises — and what to do about it


The paradox nobody wants to admit

In July 2025, a group of experienced software developers participated in a study that would change the narrative around AI at work. The experiment was simple: complete tasks in their own codebase, half with AI tools and half without.

The developers predicted AI would make them 24% faster.

The actual result: they took 19% longer with AI than without it.

But here’s the kicker: even after the experiment, developers still believed they’d been faster with AI. They estimated a 20% improvement when they’d actually gotten worse.

This study from METR (Model Evaluation and Threat Research) perfectly captures the disconnect between perception and reality that’s defining AI adoption in 2025-2026.


The 95% nobody mentions in keynotes

If the METR study were an isolated case, we could dismiss it. But it’s not.

An MIT report published in August 2025 analyzed 300 enterprise generative AI deployments. The result: only 5% achieved rapid revenue acceleration. The vast majority stalled with no measurable P&L impact.

The same report revealed something interesting about how companies adopt AI:

  • Solutions purchased from specialized vendors: 67% success rate
  • Internal builds: ~22% success rate (one-third of the former)

This contradicts the trend of many companies, especially in regulated sectors like banking and insurance, building their own proprietary generative AI systems.


The meta-analysis that dismantles the hype

For those who think these are edge cases, a meta-analysis published in California Management Review in October 2025 put the numbers on the table.

Researchers analyzed 371 estimates from studies published between 2019 and 2024. Their conclusion: there is no robust relationship between AI adoption and aggregate productivity gains once you control for methodological heterogeneity and publication bias.

In other words: studies showing dramatic improvements tend to have methodological issues, small samples, or — curiously — are funded by companies selling AI tools.


The code problem: AI-induced technical debt

For those of us in tech, the software development case is particularly relevant.

GitClear, a developer analytics company, analyzed 153 million lines of code comparing patterns before and after the mass adoption of tools like GitHub Copilot.

Their finding: AI excels at adding code quickly, but generates what they call “AI-induced technical debt.”

“Code added quickly is desirable if you’re working in isolation or on a greenfield problem. But code added hastily is toxic for teams that must maintain it afterward.” — Bill Harding, GitClear founder

A systematic review of 37 studies on LLM code assistants (July 2025) confirms the pattern: while developers spend less time on boilerplate code and API lookups, quality regressions and subsequent rework frequently offset velocity gains, especially on complex tasks.


Why is this happening?

Several factors explain this disconnect:

1. The J-curve of adoption

According to MIT Sloan researchers, manufacturing companies adopting AI frequently experience initial productivity losses before seeing improvements. This “J-curve” stems from misalignment between digital tools and legacy processes, plus the necessary investment in data infrastructure, training, and workflow redesign.

2. The context AI doesn’t have

The experienced developers in the METR study had been working on their own projects for years. They had accumulated context that no AI assistant possessed. They ended up adapting their agenda and problem-solving strategies to AI outputs, plus spending considerable time debugging generated code.

3. The illusion of speed

There’s something psychologically satisfying about seeing code appear rapidly on screen. But generation speed doesn’t equal actual productivity. If you spend 2 hours generating code and 4 hours fixing it, you haven’t saved time.

4. Wrong metrics

Many companies measure “time saved” based on self-reports. The California Management Review meta-analysis found that when objective metrics are used instead of self-reported ones, gains disappear or shrink dramatically.


What smart companies are doing

The picture isn’t entirely negative. Organizations that are seeing results share certain characteristics:

Consolidation over fragmentation

After a 2025 of scattered copilot adoption, CIOs are moving from fragmented experiments to holistic strategies. Fewer tools, better integrated.

Governance by design

Instead of adding controls after the fact, successful companies incorporate AI governance from the start of the adoption process.

Real value metrics

Abandoning self-reports and measuring actual impact: defect rates, full cycle times, customer satisfaction — not just “lines of code generated.”

Segmentation by use case

The same Brynjolfsson study cited to promote AI also shows benefits are highly unequal: workers in the bottom quartile of performance saw 35% improvements, while experienced veterans saw almost no gain. AI works better as a leveler than a universal multiplier.


My perspective as a data engineer

I work daily with data, pipelines, and automation. And I use AI tools regularly. My experience confirms what the studies show:

Where AI helps me:

  • Boilerplate and repetitive code
  • Exploring APIs I don’t know
  • First drafts of simple scripts
  • Translating between languages or frameworks

Where AI slows me down:

  • Complex business logic requiring context
  • Debugging subtle problems
  • Code that must integrate with existing systems
  • Anything requiring understanding “why” in addition to “what”

The key isn’t abandoning AI tools, but being brutally honest about when they help and when they don’t.

A concrete example: AI-generated code can work perfectly while having massive security holes. “It works” and “it’s fine” aren’t the same thing.


Conclusion: The end of magical thinking

2026 is becoming the year of expectation adjustment. The promise that AI would boost GDP by 15% and productivity by 25% is colliding with the reality of companies seeing no return on their investments.

This doesn’t mean AI is useless. It means:

  1. It’s not a universal solution — it works in specific contexts
  2. It requires organizational change investment — not just licenses
  3. Metrics matter — measure what actually matters, not what’s easy to measure
  4. Human expertise remains valuable — especially accumulated context

The competitive advantage is shifting from “who has the most AI tools” to “who best understands when and how to use them.”

And that, ironically, is a very human skill.

Are there exceptions to this 95% rule? Yes. In the Basque Country, companies using AI earn 8.7% more. The difference is in how they implement it: ecosystem before tools, governance by design, real value metrics.


Sources

  • METR (2025). “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity”
  • MIT NANDA Initiative (2025). “The GenAI Divide: State of AI in Business 2025”
  • California Management Review (2025). “Seven Myths about AI and Productivity: What the Evidence Really Says”
  • GitClear (2025). Analysis of 153 million lines of code
  • Mohamed et al. (2025). Systematic review of 37 studies on LLM code assistants

Found this useful? Share it

Share:

You might also like