The largest RCT on AI coding tools found developers were 19% slower — while believing they were 20% faster. Six independent studies converge on ~10% organizational gains despite 93% adoption. The bottleneck was never the code. It was everything around it.
Here's a number that should make every engineering leader uncomfortable: 39 percentage points.
That's the gap between how much faster developers think AI makes them and how much faster it actually makes them. In the most rigorous study of AI coding productivity ever conducted — a randomized controlled trial by METR with 246 real tasks across experienced open-source developers — those using AI tools took 19% longer to complete their work. They believed they were 20% faster.
They weren't even close.
The Data Everyone Ignores
Let's set the scene. AI coding tool adoption is essentially universal:
At this adoption level, with 27% of production code now AI-generated, you'd expect massive productivity gains. Six independent research efforts measured it. They all converged on the same number: roughly 10% at the organizational level. And that's being generous.
The DORA 2025 Report puts a finer point on it. Teams with high AI adoption completed 21% more tasks and merged 98% more pull requests. Sounds great — until you look at the rest of the row. PR size grew 154%. Review time increased 91%. Bug rates went up 9%. And organizational delivery metrics? Flat.
More code. Same throughput. More bugs. This isn't a productivity story. It's a bottleneck migration story.
The Bottleneck Was Never the Typing
Bain's analysis found that writing and testing code accounts for roughly 25–35% of the total software development lifecycle. The rest goes to code review, understanding requirements, debugging, coordination, documentation. As Gergely Orosz put it directly: "Speed of typing out code has never been the bottleneck for software development."
Amdahl's Law is unforgiving here. Even if AI made coding 100% faster — which it doesn't — the maximum organizational improvement would be 15–25%. That's the ceiling. And we're nowhere near it, because AI isn't making coding 100% faster. It's making individual tasks slightly faster while creating more work downstream.
Here's what actually happens when you give developers AI coding tools without changing anything else:
Code volume increases. More PRs, bigger PRs, more lines of code.
Review queues back up. Human reviewers can't keep up with AI-generated output.
Bug rates rise. AI-generated code introduces subtle defects that take longer to find than the code took to write.
Context switching increases. Developers spend more time checking, debugging, and fixing AI suggestions than they would have spent writing the code themselves.
The perception gap widens. Because it feels faster, nobody notices the systemic slowdown.
Cursor's CEO Michael Truell acknowledged this openly: "Cursor has made it much faster to write production code. However, for most engineering teams, reviewing code looks the same as it did three years ago." Then Cursor acquired Graphite, a code review startup. The acquisition says more about where the real constraint lives than any marketing page.
METR's Update Makes It Worse, Not Better
In February 2026, METR published an update. Their newer cohort — 800+ tasks, 57 developers — showed AI users were only 4% slower (confidence interval: -15% to +9%). Progress? Sort of.
But the update revealed something more damning: 30–50% of developers refused to participate because they didn't want to work without AI. The study was losing exactly the developers who would show the biggest productivity gains. METR's own conclusion: the data is "only very weak evidence for the size of this increase."
Think about that. The best evidence we have for AI coding productivity is a study where half the most productive AI users wouldn't even participate, and the remaining data shows somewhere between a 15% slowdown and a 9% speedup.
This isn't a ringing endorsement. It's a confidence interval that includes "makes things worse."
Why the Perception Gap Exists
The 39-point gap isn't a mystery. It's the same cognitive bias that makes people think they're above-average drivers.
AI tools give immediate, visible feedback. You type a comment and code appears. That feels productive. But the METR researchers found that extra time went to checking, debugging, and fixing AI-generated code — activities that don't feel like overhead because they happen after the dopamine hit of generated code.
It's the difference between output and outcomes. AI dramatically increases output — more code, faster. But outcomes — working software shipped to production — depend on the entire pipeline, not just the code generation step.
What Actually Moves the Needle
If raw AI coding tools deliver 10% organizational gains, what gets you past that ceiling? The data points to three things:
Move the bottleneck, not the code.
Stripe didn't just hand engineers AI tools and hope for the best. They built Toolshed — a centralized MCP server with 500+ tools — and Blueprints that mix deterministic nodes (git, lint, CI) with agentic nodes. The agent isn't replacing the developer. It's replacing the process gaps between developers.
Automate the review, not just the writing.
Faros AI's data shows that review time is where productivity dies. The answer isn't "generate code faster" — it's automated code review, deterministic guardrails (hooks), and AI-powered quality checks before code reaches a human reviewer. If your developers are still manually reviewing every AI-generated PR, you've created a bottleneck that scales linearly with AI output.
Specification before code.
The METR study measured developers working on their own repositories — projects they knew intimately. The 19% slowdown happened because AI didn't understand the broader context. This is the specification problem. When you give an AI clear, structured requirements instead of vague prompts, the context gap shrinks. The methodology matters more than the model.
The DORA report itself names this explicitly: "AI doesn't fix a team; it amplifies what's already there." Organizations with mature DevOps practices and well-defined workflows convert AI into real gains. Everyone else gets 10% — if they're lucky.
The Bottom Line
93% of developers use AI coding tools. The measured organizational productivity gain is about 10%. The gap between perception and reality is 39 percentage points. And the bottleneck isn't where anyone is looking.
The companies seeing real results — Stripe, StrongDM, the organizations at the top of DORA's rankings — aren't getting there with better models. They're getting there with better systems: specification workflows, automated review, deterministic guardrails, and agentic infrastructure that addresses the entire development lifecycle, not just the typing.
If your AI strategy is "give developers Copilot and measure lines of code," you're optimizing the wrong layer. The productivity ceiling isn't in the model. It's in everything you haven't changed around it.
This is part of a series on AI engineering methodology. Previous posts: The Ecosystem Is the Bottleneck and Three Changes That Fixed Our Hackathon.