Your Agents Are Reading Prompts. The Teams Shipping 3x Faster Are Writing Specs.

By Haim Ari · 2026-04-05

Specification-driven development — structured specs, not ad hoc prompts — is the documented pattern behind the highest-output AI engineering teams. Here's what it looks like in practice.

The 2026 CircleCI State of Software Delivery report tracked something that should bother every engineering team.

Across all projects building on CircleCI, daily workflow runs increased 59% year over year. AI is clearly helping teams produce more changes, faster. But the distribution is not uniform: the top 5% of teams nearly doubled their throughput — up 97%. The bottom quartile saw no measurable increase at all.

Same tools. Same models. Wildly different outcomes.

When you look closely at what the high-performing teams are doing differently, one pattern stands out. They are not prompting agents. They are specifying for them.

The Difference Between a Prompt and a Spec

A prompt tells an agent what you want: "Add user authentication with email and password."

A spec tells an agent what done looks like:

Objective: Add email/password authentication to the user service.

Scope:

• POST /auth/register — creates user, returns JWT

• POST /auth/login — validates credentials, returns JWT

• POST /auth/logout — invalidates token

Constraints:

• Passwords must be hashed with bcrypt (cost factor 12)

• JWT expiry: 24h access token, 30d refresh token

• No OAuth in this iteration

Definition of Done:

• Unit tests for all three endpoints

• Integration test with real database (no mocks)

• Updated OpenAPI spec in /docs/api.yaml

• No new dependencies beyond existing jwt and bcrypt packages

The prompt leaves room for the agent to make dozens of decisions you'll disagree with later. The spec eliminates most of them before the agent writes a single line.

The rework you save is where the 3x comes from.

!A developer's screen shows a structured task spec with clear objectives and a definition-of-done checklist

AGENTS.md: Specs at the Repository Level

The open standard that's emerged for this is AGENTS.md — a markdown file in your repository root that defines the persistent context every agent session inherits.

As of April 2026, AGENTS.md is supported by GitHub Copilot, OpenAI Codex, Cursor, Amp, and a growing list of tools. It's stewarded by the Agentic AI Foundation under the Linux Foundation, and over 60,000 repositories on GitHub now include one.

GitHub analyzed 2,500 repositories with AGENTS.md files and found a clear pattern in what works. The effective files share four characteristics: a specific agent persona, exact commands to run, well-defined boundaries, and real code examples over abstract descriptions.

A minimal effective AGENTS.md covers six areas:

`markdown

Project Context

[1-2 sentences: what this project does, what stack it uses]

Commands

build: bun run build

test: bun test

lint: bun run lint --fix

dev: bun run dev

Project Structure

src/api/ — Express route handlers

src/services/ — Business logic (no direct DB access)

src/db/ — Database layer only

tests/ — Mirror src/ structure

Code Style

• TypeScript strict mode. No any.

• Functions over classes for services.

• See src/services/user.ts for the pattern to follow.

Git Workflow

• Branch: feature/

• Tests must pass before opening PR.

Boundaries

• Do not modify the auth middleware without explicit instruction.

• Do not add dependencies without asking first.

That file lives in your repository. Every time an agent opens a session, it reads these before doing anything else.

!An engineer views an AGENTS.md file open in a code editor, with the repository structure visible in the sidebar

CLAUDE.md: The Deeper Layer for Claude Code

If your team is on Claude Code, CLAUDE.md goes further than AGENTS.md can.

Where AGENTS.md is deliberately tool-agnostic, CLAUDE.md is built specifically for Claude's architecture. It supports sub-agent delegation rules, permission levels, session management, a layered configuration system (global → project → subdirectory → local), and an import system that lets you compose configuration from multiple files.

The practical difference: CLAUDE.md can specify how Claude should handle ambiguity, not just what the project contains.

`markdown

Decision Protocol

If requirements are ambiguous, ask one clarifying question —

not multiple. Proceed after the answer; do not ask follow-ups

for the same task.

Escalation

Before any change that affects the database schema,

authentication logic, or public API surface: stop and describe

the planned change. Wait for explicit approval.

PR Size

Target: ≤400 lines per PR.

Hard limit: 800 lines.

Split anything larger into independent, deployable PRs.

This is the kind of context that transforms a capable model into a reliable teammate.

Try It Yourself

Here's the spec format that works well for individual tasks. Save it as a template, fill it in before handing a task to any agent.

task-spec template:

`markdown

Task: [descriptive name]

Objective

[One sentence: what this accomplishes for a user or system]

Scope

[Bulleted list: what is included]

[Bulleted list marked "Out of scope": what is explicitly not included]

Constraints

[Technical: language, frameworks, libraries allowed/forbidden]

[Behavioral: performance targets, error handling expectations]

[Process: max PR size, test requirements, docs to update]

Definition of Done

[ ] [Specific, verifiable outcome 1]

[ ] [Specific, verifiable outcome 2]

[ ] [Tests written and passing]

[ ] [No regressions in existing tests]

The five-question spec review — before handing a spec to an agent:

Could two engineers read this and agree on the output? (If no, it's still a prompt.)

Does it say what's out of scope? (Agents fill scope gaps with their own judgment.)

Is the definition of done verifiable without running the code? (Vague done = endless iteration.)

Does it mention the constraints that matter most? (Performance, security, dependencies.)

Is there an example of the style or pattern to follow? (One real example beats three paragraphs.)

Before/after for the same feature:

Prompt version:

"Add rate limiting to the API."

Spec version:

`markdown

Objective

Protect POST /auth/login from brute-force attacks.

Scope

• Rate limit POST /auth/login: 5 attempts per IP per 15 minutes

• Return 429 with Retry-After header when limit exceeded

• Out of scope: all other endpoints (separate task)

Constraints

• Use existing Redis instance (already in docker-compose.yml)

• No new npm packages — use ioredis (already a dependency)

• Add to middleware layer, not inside the route handler

Definition of Done

[ ] Middleware function in src/middleware/rateLimiter.ts

[ ] Applied to POST /auth/login only

[ ] Unit test: 6 requests in 15min window → 6th returns 429

[ ] Retry-After header set correctly

The prompt produces something that rate-limits. The spec produces what you actually wanted.

!A terminal displays a Claude Code session executing a task spec with bounded scope and a passing test output

Three Tools, Three Levels of Rigor

For teams looking to adopt this systematically, three tools dominate the space:

GitHub Spec Kit — The lightest entry point. An open-source toolkit with a specify CLI that downloads templates for your chosen AI tool (Copilot, Claude Code, Gemini CLI) and sets up the SDD scaffolding. Good starting point if you want to introduce specs without changing your existing IDE workflow.

`bash

npx specify init --tool claude-code

Creates: specs/, AGENTS.md, task-spec.md template

Kiro — An IDE built around the three-phase spec workflow: requirements.md → design.md → tasks.md. Agent hooks automatically re-evaluate specs when code changes. The structured workflow is enforced by the tool itself, which helps teams that struggle to maintain spec discipline otherwise.

CLAUDE.md + task specs inline — What many Claude Code teams land on. No new tools: the repo-level CLAUDE.md handles persistent context, and individual task specs live as markdown files in a specs/ directory. Lightweight, version-controlled, and composable.

Migrating from Prompts to Specs

You do not need to rewrite everything. A practical migration path:

Week 1: Add AGENTS.md — Cover the six sections above. This alone stops the most common agent errors: wrong commands, wrong file locations, missed code style conventions. One hour of work; immediate impact on every session.

Week 2: Write specs for new work — For any task that will take an agent more than 30 minutes, spend 10 minutes writing a spec first. The spec forces you to clarify requirements you hadn't fully thought through — value independent of the agent.

Week 3: Add a Definition of Done requirement — Before any agent task is considered complete, it must satisfy the checklist in the spec. This is the step that actually closes the rework loop.

Week 4+: Evaluate what to formalize — Some teams add a specs/` directory and version-control all task specs. Others keep it lightweight. The right level of rigor depends on your team size and how often agents go off-track.

The ceiling is the fully spec-driven workflow — where every non-trivial feature starts with a spec, the agent implements against it, and the spec serves as the living documentation. That's where the 97% throughput gain comes from.

But the floor is just AGENTS.md and one task spec. Start there.

The teams shipping faster are not using better models. They are doing the work before the work — turning ambiguous intent into bounded, verifiable specifications that agents can execute reliably.

The investment is writing clarity down. The return is not having to review, rework, and re-explain the same decisions session after session.