• Tests must pass before opening PR.
Boundaries
• Do not modify the auth middleware without explicit instruction.
• Do not add dependencies without asking first.
`
That file lives in your repository. Every time an agent opens a session, it reads these before doing anything else.
!An engineer views an AGENTS.md file open in a code editor, with the repository structure visible in the sidebar
CLAUDE.md: The Deeper Layer for Claude Code
If your team is on Claude Code, CLAUDE.md goes further than AGENTS.md can.
Where AGENTS.md is deliberately tool-agnostic, CLAUDE.md is built specifically for Claude's architecture. It supports sub-agent delegation rules, permission levels, session management, a layered configuration system (global → project → subdirectory → local), and an import system that lets you compose configuration from multiple files.
The practical difference: CLAUDE.md can specify how Claude should handle ambiguity, not just what the project contains.
`markdown
Decision Protocol
If requirements are ambiguous, ask one clarifying question —
not multiple. Proceed after the answer; do not ask follow-ups
for the same task.
Escalation
Before any change that affects the database schema,
authentication logic, or public API surface: stop and describe
the planned change. Wait for explicit approval.
PR Size
Target: ≤400 lines per PR.
Hard limit: 800 lines.
Split anything larger into independent, deployable PRs.
`
This is the kind of context that transforms a capable model into a reliable teammate.
Try It Yourself
Here's the spec format that works well for individual tasks. Save it as a template, fill it in before handing a task to any agent.
task-spec template:
`markdown
Task: [descriptive name]
Objective
[One sentence: what this accomplishes for a user or system]
Scope
[Bulleted list: what is included]
[Bulleted list marked "Out of scope": what is explicitly not included]
Constraints
[Technical: language, frameworks, libraries allowed/forbidden]
[Behavioral: performance targets, error handling expectations]
[Process: max PR size, test requirements, docs to update]
Definition of Done
[ ] [Specific, verifiable outcome 1]
[ ] [Specific, verifiable outcome 2]
[ ] [Tests written and passing]
[ ] [No regressions in existing tests]
`
The five-question spec review — before handing a spec to an agent:
Could two engineers read this and agree on the output? (If no, it's still a prompt.)
Does it say what's out of scope? (Agents fill scope gaps with their own judgment.)
Is the definition of done verifiable without running the code? (Vague done = endless iteration.)
Does it mention the constraints that matter most? (Performance, security, dependencies.)
Is there an example of the style or pattern to follow? (One real example beats three paragraphs.)
Before/after for the same feature:
Prompt version:
"Add rate limiting to the API."
Spec version:
`markdown
Objective
Protect POST /auth/login from brute-force attacks.
Scope
• Rate limit POST /auth/login: 5 attempts per IP per 15 minutes
• Return 429 with Retry-After header when limit exceeded
• Out of scope: all other endpoints (separate task)
Constraints
• Use existing Redis instance (already in docker-compose.yml)
• No new npm packages — use ioredis (already a dependency)
• Add to middleware layer, not inside the route handler
Definition of Done
[ ] Middleware function in src/middleware/rateLimiter.ts
[ ] Applied to POST /auth/login only
[ ] Unit test: 6 requests in 15min window → 6th returns 429
[ ] Retry-After header set correctly
`
The prompt produces something that rate-limits. The spec produces what you actually wanted.
!A terminal displays a Claude Code session executing a task spec with bounded scope and a passing test output
Three Tools, Three Levels of Rigor
For teams looking to adopt this systematically, three tools dominate the space:
GitHub Spec Kit — The lightest entry point. An open-source toolkit with a specify CLI that downloads templates for your chosen AI tool (Copilot, Claude Code, Gemini CLI) and sets up the SDD scaffolding. Good starting point if you want to introduce specs without changing your existing IDE workflow.
`bash
npx specify init --tool claude-code
Creates: specs/, AGENTS.md, task-spec.md template
`
Kiro — An IDE built around the three-phase spec workflow: requirements.md → design.md → tasks.md. Agent hooks automatically re-evaluate specs when code changes. The structured workflow is enforced by the tool itself, which helps teams that struggle to maintain spec discipline otherwise.
CLAUDE.md + task specs inline — What many Claude Code teams land on. No new tools: the repo-level CLAUDE.md handles persistent context, and individual task specs live as markdown files in a specs/ directory. Lightweight, version-controlled, and composable.
Migrating from Prompts to Specs
You do not need to rewrite everything. A practical migration path:
Week 1: Add AGENTS.md — Cover the six sections above. This alone stops the most common agent errors: wrong commands, wrong file locations, missed code style conventions. One hour of work; immediate impact on every session.
Week 2: Write specs for new work — For any task that will take an agent more than 30 minutes, spend 10 minutes writing a spec first. The spec forces you to clarify requirements you hadn't fully thought through — value independent of the agent.
Week 3: Add a Definition of Done requirement — Before any agent task is considered complete, it must satisfy the checklist in the spec. This is the step that actually closes the rework loop.
Week 4+: Evaluate what to formalize — Some teams add a specs/` directory and version-control all task specs. Others keep it lightweight. The right level of rigor depends on your team size and how often agents go off-track.
The ceiling is the fully spec-driven workflow — where every non-trivial feature starts with a spec, the agent implements against it, and the spec serves as the living documentation. That's where the 97% throughput gain comes from.
But the floor is just AGENTS.md and one task spec. Start there.
The teams shipping faster are not using better models. They are doing the work before the work — turning ambiguous intent into bounded, verifiable specifications that agents can execute reliably.
The investment is writing clarity down. The return is not having to review, rework, and re-explain the same decisions session after session.