QuantumBlack Ships Agentic Code at 90% Accuracy. The Pattern Is a Folder Convention.

By Haim Ari · 2026-04-26T14:00:00

McKinsey's QuantumBlack documented their agentic software architecture in public. The surprise isn't the model choice or the orchestration framework — it's that the whole thing rides on a folder layout, frontmatter metadata, and a single human review at the end. Here's the architecture in plain English, and a working scaffold you can drop into any Claude Code project this afternoon.

McKinsey's QuantumBlack practice published the architecture they use for agentic software development. I was expecting a vendor pitch. What I got was something more useful — a documented, opinionated, surprisingly minimal pattern that any small team can copy.

The headline number from their own materials: roughly 90% of agent-generated code lands accurately, leaving about 10% for human review. The bank-modernization case study they describe — 400 pieces of legacy software, $600M budget — is being executed with humans in supervisory roles over agent squads, each agent contributing to a defined sequence.

That part is the press-release version. The interesting part is the architecture underneath, because they wrote it down.

The pattern is two layers and a folder:

An orchestration layer that is deterministic, not agentic — a workflow engine that controls phase transitions and artifact state.

An execution layer of specialized agents, each running inside a bounded scope.

A folder convention (.sdlc/) that holds context, specs, templates, and accumulated knowledge as machine-readable artifacts.

Humans enter the loop exactly once: when the PR opens. The whole feature — specs, architecture decisions, task breakdown, code, tests — is reviewed together. Earlier intervention kills the speed advantage, because every interruption forces a context rebuild.

That's the entire pattern. The reason it works is that it's boring in the right places.

!Architecture overview

Why "Deterministic Orchestration" Matters

The instinct, when you start building agent workflows, is to make the orchestrator itself an agent. A planner agent that decides what to do next. A router agent that picks which sub-agent to call. A supervisor agent that grades the supervisors.

That works in demos. It fails in production for the same reason agentic loops with no exit conditions fail: agents are non-deterministic, and chaining non-determinism multiplies it.

The QuantumBlack pattern flips this. The orchestrator is a plain workflow engine — phase A produces artifact X, phase B reads artifact X and produces artifact Y, phase C reads Y and produces Z. Transitions are gated by automated evaluations, not by an agent's opinion. Phase ordering is fixed.

Inside each phase, an agent does the creative work — write the spec, propose the architecture, generate the tasks, write the code. But the agent is bounded. It has a clear input (the previous phase's artifact), a clear output (this phase's artifact), and a clear exit condition (the artifact passes evaluation).

The result: the system has agentic behavior in the parts where you want creativity, and zero agentic behavior in the parts where you want repeatability. Most of the failure modes of agentic development — looping, oscillation, drift — come from putting non-deterministic logic where deterministic logic belongs.

If you've ever watched an agent confidently re-decide the same architecture question on three different turns of the same session, you already know why this matters.

The .sdlc/ Folder

The whole convention rests on this:

.sdlc/

context/ # persistent project knowledge — coding standards, domain glossary, infra map

specs/ # per-feature specifications, one folder per feature

templates/ # artifact templates used by every phase

knowledge/ # accumulated decisions and post-mortems (writes back from PR reviews)

src/ # the actual codebase

Every artifact inside .sdlc/specs/

`yaml

id: feat-2026-0427-agentic-sdlc

status: in_review

phase: implementation

parent: spec-2026-0427-agentic-sdlc

artifacts:

spec: ./spec.md

architecture: ./architecture.md

tasks: ./tasks.yaml

pr: https://github.com/org/repo/pull/4421

created: 2026-04-26

owner: haim

This is the part that makes the pattern compose. Because every artifact is machine-readable and every artifact references its parent, an agent can be dropped into any phase and reconstruct the full context by walking the tree backwards. There is no hidden state. The entire history of how this feature was reasoned about is in the folder.

When the PR reviewer leaves a comment that changes a design assumption, that comment gets written back to .sdlc/knowledge/. The next time an agent is asked to design something similar, that knowledge file is part of its context. The system gets better the more PRs it ships.

This is the closest thing I've seen to documentation that maintains itself.

!Folder structure

The Phased Workflow

The QuantumBlack flow has four phases, each gated:

Requirements — agent reads the request and any linked context, produces a spec.md with frontmatter, problem statement, success criteria, scope boundaries.

Architecture — agent reads the spec, surveys .sdlc/context/ for relevant patterns, produces architecture.md with component decisions and trade-offs.

Tasks — agent decomposes the architecture into a tasks.yaml with explicit dependencies and acceptance criteria per task.

Implementation — agent (or parallel agents, one per task) writes code and tests against src/, opens a PR, and links every commit back to the originating task.

Each phase ends with an automated evaluation gate: spec covers the requested scope, architecture is consistent with context/, tasks are independently executable, code passes tests and lints. Only after the gate passes does the next phase start.

The human reviewer does not see the spec when it's written, the architecture when it's drafted, or the tasks when they're decomposed. They see all of it together when the PR opens. The PR is the unit of review.

This sounds wrong the first time you read it. The instinct is to stop the agent at every phase, check, course-correct. The QuantumBlack data — and my own experience running similar loops — says the opposite. The agent is faster, more consistent, and easier to evaluate when you let it complete the full cycle. Mid-flight intervention forces a context rebuild and almost always introduces inconsistency between artifacts. If something is wrong in the spec, you'd rather see it expressed all the way through to code, because then you can see whether it actually breaks anything.

The 90% Number, Honestly

The "90% accuracy" claim from QuantumBlack's public materials is not a software-delivery speed-up — it's the share of agent-generated code that requires no human correction in their internal benchmarks. The "30%" figure that floats around the same conversation is from a different domain (credit-review workflows). I am separating these on purpose because the temptation to mash all the numbers together is exactly how this kind of architecture pattern gets oversold.

What I think the honest claim looks like is this: with a deterministic orchestrator, bounded agent execution, and a frontmatter-driven artifact tree, a small team can run several full requirements-to-PR cycles per day on features that used to take a week. The agent isn't faster than a senior engineer. It's faster than the coordination overhead between a junior, a tech lead, an architect, and a PR reviewer. That's where the time savings live.

If you remember nothing else from this post, remember that. Agentic development is not a code-generation story. It's a coordination story.

!Phased workflow

Try It Yourself

Here is a working scaffold you can drop into a Claude Code project this afternoon. It builds the .sdlc/ folder, gives you a spec template with frontmatter, and wires four agents — one per phase — that read and write into the tree.

Step 1 — Scaffold the folder

`bash

mkdir -p .sdlc/{context,specs,templates,knowledge}

cat .sdlc/templates/spec.md <<'EOF'

id:

phase: requirements

parent: null

artifacts:

spec: ./spec.md

architecture: ./architecture.md

tasks: ./tasks.yaml

pr: null

created:

•

• In scope:

• Out of scope:

Open questions

EOF

cat .sdlc/templates/architecture.md <<'EOF'

id:

phase: architecture

parent:

Data flow

Trade-offs considered

Decisions

EOF

cat .sdlc/templates/tasks.yaml <<'EOF'

spec_id:

• id: t1

title: ""

depends_on: []

acceptance:

• ""

EOF

Step 2 — Define the four phase agents

Drop these into .claude/agents/ (the Claude Code agent directory). Each agent has a single phase scope and a single exit condition. They read and write inside the feature's spec folder, nothing else.

`markdown

description: Phase 1 of the .sdlc/ flow — turns a feature request into a spec.md

tools: Read, Write, Edit, Glob, Grep, WebSearch

You are the requirements phase of an agentic SDLC.

Read the feature request from the user. Read every file in .sdlc/context/.

Read .sdlc/templates/spec.md. Write a new spec to

.sdlc/specs/

Exit criteria:

• spec.md exists with valid frontmatter

• problem, success criteria, scope are filled in

• open questions are listed (do not invent answers)

Do NOT touch architecture.md, tasks.yaml, or src/.

`markdown

description: Phase 2 — reads spec.md and produces architecture.md

tools: Read, Write, Edit, Glob, Grep

You are the architecture phase of an agentic SDLC.

Read .sdlc/specs/

existing patterns. Read .sdlc/knowledge/ for prior decisions. Write

architecture.md using .sdlc/templates/architecture.md as the template.

Exit criteria:

• architecture.md frontmatter parent matches the spec id

• every "open question" from the spec is either answered or explicitly

marked as still-open

• decisions section references existing patterns where applicable

Do NOT modify spec.md. Do NOT write tasks.yaml.

`markdown

description: Phase 3 — decomposes architecture into tasks.yaml

tools: Read, Write, Edit, Glob

You are the task-decomposition phase.

Read spec.md and architecture.md. Produce tasks.yaml using the template.

Each task must be independently executable, have explicit acceptance

criteria, and declare its dependencies.

Exit criteria:

• every component in architecture.md is covered by at least one task

• task ids are referenced in dependency edges that form a DAG (no cycles)

• acceptance criteria are testable

`markdown

description: Phase 4 — implements tasks against src/, opens PR

tools: Read, Write, Edit, Glob, Grep, Bash

You are the implementation phase.

Read tasks.yaml. For each task in dependency order: implement the change

in src/, write or update tests, run the test suite, commit with a message

that includes the task id (e.g. "feat:

When all tasks pass: open a PR. The PR description must link every commit

back to the originating task and reference the spec, architecture, and

tasks files.

Exit criteria:

• all tests pass

• PR is open

• spec.md frontmatter pr field is updated to the PR URL

• spec.md status is set to in_review

Step 3 — Run the pipeline

From the repo root:

`bash

kick off the requirements phase against a free-form feature request

claude code --agent requirements-agent \

"Add rate limiting to the public API. Per-IP, 100 req/min, returns 429."

review the generated spec, then trigger the next phase

claude code --agent architecture-agent \

"Process spec at .sdlc/specs/feat-2026-0426-rate-limiting/spec.md"

continue through tasks and implementation

claude code --agent tasks-agent \

"Process .sdlc/specs/feat-2026-0426-rate-limiting/architecture.md"

claude code --agent implementation-agent \

"Process .sdlc/specs/feat-2026-0426-rate-limiting/tasks.yaml"

Step 4 — Wire the evaluation gates

The deterministic part is the evaluation between phases. The simplest version is a shell script per phase:

`bash

.sdlc/gates/check-spec.sh

#!/usr/bin/env bash

set -euo pipefail

SPEC="$1"

frontmatter exists

head -1 "$SPEC" { echo "missing frontmatter"; exit 1; }

required sections

for section in "## Problem" "## Success criteria" "## Scope"; do

grep -q "^$section" "$SPEC" || { echo "missing $section"; exit 1; }

done

success criteria has at least one bullet

awk '/^## Success criteria/,/^## /' "$SPEC" | grep -q '^- ' \

|| { echo "no success criteria"; exit 1; }

echo "spec gate OK"

You can graduate to a real workflow engine (Temporal, Prefect, even GitHub Actions) once the pattern stabilizes. Start with shell scripts. The point of the deterministic layer is that it's auditable and boring, not that it's sophisticated.

Step 5 — Close the knowledge loop

When a PR review changes an assumption, write the lesson back:

`bash

cat .sdlc/knowledge/2026-decisions.md <<'EOF'

2026-04-26 — Rate-limiting buckets per-IP, not per-API-key

Reviewer:

Decision: Buckets are keyed by client IP, not API key, because anonymous

endpoints have no key. This overrides the architecture agent's initial

proposal.

EOF

The next time an agent designs anything in this surface area, that knowledge file is in its context window. The system learns from its reviewers.

What This Doesn't Solve

The pattern is not a silver bullet. It explicitly assumes:

• Your codebase has stable conventions worth distilling into .sdlc/context/. Greenfield projects with no history will struggle until enough decisions accumulate.

• Your test suite is trustworthy. Implementation-agent's exit criterion is "all tests pass." If the tests are weak, the agent will ship weak code that passes weak tests.

• Your reviewers are willing to read full PRs, not approve them in chunks. The single-review-at-the-end discipline only works if the review is real.

• You have someone who maintains .sdlc/templates/ and .sdlc/context/. Templates rot. Context drifts. Owning that maintenance is the new thing teams need to staff.

The most underestimated part is the last one. Documentation has historically failed because there was no incentive to keep it current. In this pattern, the templates and context files are load-bearing — agents read them every cycle, and bad context produces bad output immediately. That feedback loop is what keeps the docs honest.

The Bigger Picture

Every team that's serious about agentic development is converging on something close to this shape. Anthropic's Managed Agents API gives you the bounded-execution piece. GitHub Agentic Workflows gives you the deterministic-orchestration piece. Spec-driven frameworks (Kiro, Spec Kit, AGENTS.md) are converging on artifact conventions that look very similar to .sdlc/`.

The thing QuantumBlack got right is making it small enough to copy. The whole pattern is a folder, four phases, four agents, and a shell script per gate. That's it.

If your team is still trying to make a single super-agent do the whole pipeline, this is your sign to break it into phases. The overhead of designing the artifact contracts pays itself back the first time an agent crashes mid-implementation and the next agent picks up exactly where it left off, because everything it needs is already on disk.

Coding was never the bottleneck. Coordination was. Agentic SDLC is the first pattern I've seen that takes that seriously.