StrongDM built a Software Factory where no human writes code and no human reviews code. In 8 months, three engineers shipped 32,000 lines of production software — entirely written by AI agents running against specifications. Here's how it works, why it works, and what it means for the rest of us.
In July 2025, three engineers at StrongDM made a decision that most engineering teams would call reckless:
No human would write code. No human would review code.
Eight months later, those three engineers — Justin McCarthy, Jay Taylor, and Navan Chauhan — had 32,000 lines of production software running in the wild. Every single line written by an AI agent. Not a single one written or reviewed by a human hand.
This isn't a productivity experiment. It's a paradigm shift.
The Two Rules That Changed Everything
Most companies adopting AI have the same relationship with their tools: AI is a fast intern. You prompt it, review what it writes, fix what it breaks, and merge what survives your scrutiny. The human is the quality gate. The AI is the assistant.
StrongDM inverted this entirely.
Their Software Factory operates under exactly two inviolable rules:
No human writes code.
No human reviews code.
If you break one of those rules — even once, even for a "quick fix" — you've broken the system. Because the moment a human starts writing code, the agents stop learning how to handle that class of problem autonomously. You've just added a dependency that doesn't scale.
What humans do at StrongDM is design specifications, curate test scenarios, and watch the scores. They're system architects, not coders. The distinction matters more than it sounds.
What the Software Factory Actually Is
The core of the system is something they call Attractor — a non-interactive coding agent structured as a directed graph of phases. Each node in the graph corresponds to a phase of work (planning, implementation, testing, refinement) and is governed by a core prompt. Transitions between phases happen automatically based on outputs.
The pipeline is defined in Graphviz DOT syntax — declarative, visual, and version-controllable. You can look at a pipeline definition and understand exactly what the agent is doing at each stage, in what order, with what constraints. It's infrastructure-as-code for autonomous development workflows.
"Non-interactive" is the operative word here. Attractor doesn't pause and ask for clarification. It doesn't surface a diff for a human to review. It runs to completion. The quality gate is the test harness, not a person.
Here's what's remarkable about how they built it: the entire Attractor specification is written in natural language — approximately 6,000–7,000 lines of what they call NLSpec. No code, just behavioral constraints, interface semantics, and system boundaries. The specification drives the agents; the agents generate the code; the test harness validates the code. The humans write the specification.
The loop closes without a human in the critical path.
The Digital Twin Universe
If "no human reviews code" is the philosophical heart of the Software Factory, the Digital Twin Universe (DTU) is its technical nervous system.
Here's the problem they had to solve: you can't run thousands of test scenarios per hour against real external services. You'd hit rate limits. You'd rack up API costs. You'd occasionally corrupt production data. And you'd violate every data governance policy your legal team has ever written.
So they built replicas.
StrongDM's DTU contains behavioral clones of every external service their software touches — Okta, Jira, Slack, Google Docs, Google Drive, Google Sheets. Not mocks in the traditional sense. Behavioral simulations: services that respond to sequences of operations the way the real services would, including state management, error cases, asynchronous callbacks, rate limiting, and authentication flows.
Against these replicas, Attractor runs thousands of test scenarios per hour with zero rate limits, zero API costs, and zero risk of breaking anything real.
The DTU also solves a governance problem that nobody talks about openly: letting AI agents access production systems during development is a regulatory nightmare. Data protection, operational security, compliance requirements — all of them argue against agents touching real services during the development loop. The DTU isolates development entirely.
It's elegant. Build a sandbox that's accurate enough to train against, and you've removed every constraint that limits how fast your agents can iterate.
The October 2024 Inflection Point
Why did this become possible in July 2025? Why not earlier?
McCarthy, Taylor, and Chauhan are specific about the catalyst: the second revision of Claude 3.5, released October 2024.
Before that revision, long-horizon agentic coding workflows had a characteristic failure mode: errors compounded. The agent would make a small mistake early in a complex task, and subsequent steps would build on that mistake, resulting in outputs that were confidently wrong in hard-to-detect ways. Running agents without human review in that environment wasn't reckless — it was impossible.
After the October 2024 revision, something changed. Long-horizon agentic workflows began to converge rather than diverge. Correctness compounded instead of errors. The agent could hold context across a complex multi-phase task and maintain consistency throughout.
That was the inflection point. Not a new AI paradigm — a capability threshold being crossed. Once the models could reliably converge on correct solutions across long horizons, the human review step stopped being a quality gate and started being a bottleneck.
StrongDM saw this early and acted on it.
What This Means for Everyone Else
Let me be precise about what StrongDM has actually demonstrated, because there's a version of this story that's inspiring and a version that's lazy.
The lazy version: "AI will replace developers."
The accurate version: Three people, by redesigning the human-agent interface, are doing the work of a much larger team — with higher test coverage, faster iteration, and without the coordination overhead that comes with headcount.
This isn't about replacement. It's about leverage.
When you accept that the bottleneck isn't model capability but workflow architecture — who designs the specification, what the test harness validates, how the pipeline is structured — you start asking completely different questions.
The questions most engineering teams are asking in 2026:
• "How do I get the AI to write better code?"
• "How do I review AI-generated code faster?"
• "What tasks should I give the AI?"
The questions StrongDM is asking:
• "What behavioral constraints should the specification encode?"
• "What test scenarios would catch the edge cases the AI doesn't know to look for?"
• "How do we structure the pipeline so that the agent converges reliably on complex tasks?"
These are harder questions. They require deeper thinking. And they produce dramatically better leverage.
The Specification Is the Job
The hardest shift for most developers to make isn't learning to use AI tools. It's accepting that the highest-leverage thing they can do is write better specifications.
Not better prompts. Specifications — documents that describe what the system should do, under what constraints, with what interfaces, against what test conditions. The kind of thinking that used to happen in product requirements documents that nobody read and that were immediately wrong the moment a developer opened their IDE.
At StrongDM, the specification isn't a formality that precedes the real work. It is the real work. The 6,000–7,000 lines of NLSpec that define Attractor represent the accumulated engineering judgment of three people who've figured out exactly what a non-interactive coding agent needs to know to do its job reliably.
Everything downstream — the code, the tests, the production behavior — flows from that specification.
This is what the next generation of senior engineers will do. Not write code. Write specifications that agents execute at scale.
The Bottom Line
StrongDM's Software Factory is eight months old. Three engineers. 32,000 lines of production code. Zero written by human hands.
The instinctive reaction to this story is skepticism: Surely the quality is worse. Surely there are bugs. Surely a human would catch things the AI misses.
Maybe. But their test harness runs thousands of scenarios per hour. A human code reviewer doesn't.
The more honest reaction is to ask what this means for how you're working right now. Because the gap between "AI is a fast intern you supervise" and "AI is the execution layer you architect" isn't a gap in model capability. It's a gap in workflow design.
The October 2024 inflection point already happened. The capability threshold has been crossed. What StrongDM proved is that most of the remaining gap isn't in the models — it's in how we structure the human-agent relationship.
Three people saw that and built accordingly.
The question is whether your team will.
Related: The LLM Isn't the Bottleneck Anymore. The Ecosystem Is. — Why workflow architecture matters more than model capability in 2026.