Everyone's Building Agent Loops From Scratch. Anthropic Just Shipped the Entire Stack.

By Haim Ari · 2026-04-11T10:00:00

Managed Agents, the Advisor strategy, and automated Code Review — three releases that together form a complete agent platform. Here's what each one does and how to wire them into a real project.

Most teams building with AI agents right now are assembling infrastructure from parts.

They write their own agent loop. They manage their own containers. They build retry logic, tool execution, context window management, and model routing — all before the agent does anything useful. Then they bolt on some form of code quality check, usually manual, and call it a pipeline.

It works. It's also a significant amount of undifferentiated engineering.

In April 2026, Anthropic shipped three features that, taken individually, are each useful. Taken together, they form something more interesting: a vertically integrated agent platform stack. One layer runs agents in managed cloud infrastructure. Another routes reasoning to the right model tier based on task difficulty. A third automates multi-angle code review with confidence-based filtering.

Here's what each layer does, how they connect, and how to set them up.

!Architecture diagram showing three layers of Anthropic's agent platform stack — Managed Agents at the base, Advisor routing in the middle, and Code Review at the top

Layer 1: Managed Agents — The Runtime You Don't Have to Build

The biggest time sink in agent development isn't the prompt. It's the infrastructure around it.

You need a container for the agent to execute code in. You need tool execution with proper error handling. You need file system access, web search, bash execution, and some way to stream results back to the caller. You need the agent loop itself — the cycle of think, act, observe, repeat — with context management so the conversation doesn't blow past the window.

Anthropic's Managed Agents handles all of this as a service. You define an agent (model, system prompt, tools), configure an environment (container with packages and network rules), and start a session. The platform runs the loop, executes tools, manages context, and streams events back via SSE.

The architecture has four concepts:

The key difference from a raw Messages API call: the agent runs autonomously. You send a task, it decides which tools to use, executes them inside the container, and streams results back. You can steer it mid-execution by sending additional events, or interrupt it entirely.

What makes this practical for production is the environment configuration. You can lock down networking to specific hosts, pre-install packages (pip, npm, cargo, apt — all cached), and control exactly what the agent can access. For a data pipeline agent, you might allow only your internal API and database host. For a coding agent, unrestricted networking with pre-installed language runtimes.

Layer 2: The Advisor Strategy — Frontier Intelligence at Executor Prices

Running every agent task through Opus is expensive. Running everything through Haiku misses on the hard problems. Most teams solve this by picking one model and accepting the tradeoff.

The advisor strategy eliminates the tradeoff.

The pattern: a fast, cheap model (Sonnet or Haiku) runs the full task — reading files, calling tools, iterating toward a solution. When it hits a decision it can't confidently make, it escalates to Opus for strategic guidance. Opus reviews the shared context, provides direction, and hands back. The executor continues.

This inverts the typical orchestrator pattern. Instead of an expensive model delegating to cheaper workers, the cheap model drives execution and escalates selectively.

The benchmarks back this up:

The Sonnet+Opus configuration is the one that should get attention. It's not just cheaper — it actually performs better than Sonnet alone, because the occasional Opus consultation catches mistakes that compound.

The implementation is a single parameter addition. You declare the advisor as a tool, and the model decides when to use it:

``python

response = client.messages.create(

model="claude-sonnet-4-6",

tools=[

{

"type": "advisor_20260301",

"name": "advisor",

"model": "claude-opus-4-6",

"max_uses": 3,

# ... your other tools

messages=[{"role": "user", "content": "Review this PR and suggest improvements"}],

)

The max_uses parameter caps how many times Opus gets consulted per request. Advisor tokens are billed at Opus rates, executor tokens at Sonnet rates, and the usage block reports them separately so you can track spend per tier.

No orchestration logic. No worker pool management. No task decomposition. One API call, two model tiers, with the routing handled by the executor itself.

!A comparison showing traditional single-model routing versus the advisor pattern, where the executor escalates selectively

Layer 3: Code Review — Agents Reviewing Agents

Automated code review has existed for years. Most implementations share a problem: they're noisy. They flag style issues, pedantic suggestions, and false positives that train developers to ignore the comments entirely.

The Claude Code Review plugin takes a different approach. Instead of one reviewer scanning everything, it runs five specialized agents in parallel, each examining changes through a different lens:

• CLAUDE.md compliance — Does this change follow the project's documented conventions?

• Bug detection — Logic errors, edge cases, security vulnerabilities

• Git history context — Does this change break patterns established in recent commits?

• Previous PR comments — Were there review comments on similar code that apply here?

• Code comment verification — Do inline comments still match what the code actually does?

Each finding gets a confidence score from 0 to 100. Only findings above a configurable threshold (default: 80) get posted as comments. This is the critical difference — the confidence filter eliminates the noise that makes teams ignore automated reviews.

The plugin also skips PRs that don't need review: closed, draft, automated, and already-reviewed pull requests. When it does review, comments include direct GitHub links with full SHA hashes and line number ranges so developers can jump straight to the code.

For teams running agents that produce code (Managed Agents writing features, Copilot generating implementations), automated review creates a necessary quality gate. The agent writes code; another set of agents reviews it; humans make the final call on a cleaner signal.

How the Three Layers Connect

Each layer solves a different problem, but they compose into something more than the sum of parts:

┌─────────────────────────────────────┐

│ Code Review (Quality Gate) │ ← Agents review agent-written code

├─────────────────────────────────────┤

│ Advisor Strategy (Model Routing) │ ← Smart escalation to Opus

├─────────────────────────────────────┤

│ Managed Agents (Runtime + Infra) │ ← Containerized agent execution

└─────────────────────────────────────┘

A Managed Agent session uses Sonnet with an Opus advisor. It runs inside a configured container with network access to your internal APIs. It writes code, commits it, opens a PR. The Code Review plugin triggers on that PR, running five parallel review agents. Their confidence-filtered findings land as comments. A human reviews the high-confidence findings and merges.

The agent development lifecycle — from task execution to model routing to quality assurance — is handled by the platform. Your team focuses on what the agents should do, not how to run them.

Multi-Agent Orchestration: The Advanced Pattern

Managed Agents also supports multi-agent sessions where one coordinator delegates to specialized agents. Each agent runs in its own session thread with isolated context, but they share the container filesystem.

The pattern maps cleanly to real engineering workflows:

`python

Create specialized agents

reviewer = client.beta.agents.create(

name="Code Reviewer",

model="claude-sonnet-4-6",

system="Review code for bugs, security issues, and style violations.",

tools=[{"type": "agent_toolset_20260401"}],

)

test_writer = client.beta.agents.create(

name="Test Writer",

model="claude-sonnet-4-6",

system="Write comprehensive tests for the code changes.",

tools=[{"type": "agent_toolset_20260401"}],

)

Create coordinator that can delegate

lead = client.beta.agents.create(

name="Engineering Lead",

model="claude-sonnet-4-6",

system="Coordinate engineering work. Delegate review and testing.",

tools=[{"type": "agent_toolset_20260401"}],

callable_agents=[

{"type": "agent", "id": reviewer.id, "version": reviewer.version},

{"type": "agent", "id": test_writer.id, "version": test_writer.version},

)

The coordinator spawns threads, delegates tasks, and synthesizes results. Each agent operates with its own model, system prompt, and tools. Only one level of delegation is supported — callable agents can't spawn their own sub-agents — which keeps the architecture predictable.

Thread events stream to both the individual thread and the session-level stream, so you get both the detailed trace and the high-level summary.

Try It Yourself

Here's how to set up each layer in a real project.

Install the Anthropic CLI and SDK

`bash

macOS

brew install anthropics/tap/ant

Verify

ant --version

Install Python SDK

pip install anthropic

Set your API key

export ANTHROPIC_API_KEY="your-key-here"

Create a Managed Agent with Advisor Routing

This combines layers 1 and 2 — a managed agent that uses the advisor strategy:

`python

from anthropic import Anthropic

client = Anthropic()

Create an agent with Sonnet executor + Opus advisor

agent = client.beta.agents.create(

name="Feature Builder",

model="claude-sonnet-4-6",

system="""You are a senior engineer. Write clean, tested code.

When facing complex architectural decisions, consult your advisor.""",

tools=[

{"type": "agent_toolset_20260401"},

{

"type": "advisor_20260301",

"name": "advisor",

"model": "claude-opus-4-6",

"max_uses": 3,

)

Create an environment with your project's dependencies

environment = client.beta.environments.create(

name="project-env",

config={

"type": "cloud",

"packages": {

"pip": ["pytest", "black", "mypy"],

"npm": ["typescript", "eslint"],

"networking": {"type": "unrestricted"},

)

Start a session

session = client.beta.sessions.create(

agent=agent.id,

environment_id=environment.id,

title="Implement user auth",

)

Send a task and stream results

with client.beta.sessions.events.stream(session.id) as stream:

client.beta.sessions.events.send(

session.id,

events=[{

"type": "user.message",

"content": [{

"type": "text",

"text": "Implement JWT authentication with refresh tokens. Include unit tests."

}],

)

for event in stream:

match event.type:

case "agent.message":

for block in event.content:

print(block.text, end="")

case "agent.tool_use":

print(f"\n[Tool: {event.name}]")

case "session.status_idle":

print("\nDone.")

break

Lock Down the Environment for Production

For production agents, restrict network access:

`python

secure_env = client.beta.environments.create(

name="production-agent",

config={

"type": "cloud",

"packages": {

"pip": ["requests", "pydantic"],

"networking": {

"type": "limited",

"allowed_hosts": ["api.your-company.com", "db.your-company.com"],

"allow_mcp_servers": True,

"allow_package_managers": True,

)

Set Up Code Review on Your Repository

Install the Code Review plugin in Claude Code:

`bash

Install the plugin (212,000+ installs and counting)

claude plugins install code-review

Run review on current branch

claude /code-review

The plugin automatically:

• Detects the current PR from your branch

• Runs five parallel review agents

• Filters findings by confidence (default threshold: 80)

• Posts comments with direct GitHub links to specific lines

To adjust the confidence threshold:

`bash

Lower threshold = more findings (including lower-confidence ones)

claude /code-review --threshold 60

Higher threshold = only the most certain findings

claude /code-review --threshold 90

Wire It All Together

The production workflow:

`bash

Agent builds the feature (Managed Agent + Advisor)

→ Sonnet writes code, consults Opus on architecture decisions

→ Runs in sandboxed container with your project's dependencies

Agent commits and opens a PR

→ Managed Agent has git access in the container

Code Review triggers on the PR

→ Five specialized agents review in parallel

→ Only high-confidence findings posted as comments

Human reviews the filtered findings and merges

→ Clean signal, no noise

What About Internal APIs Behind a Firewall?

This is the question every enterprise team asks within five minutes of reading the docs — and the answer matters.

Managed Agent containers run on Anthropic's cloud. If your APIs and databases sit on a private network with no internet exposure, the agent container has no route to reach them. Neither unrestricted nor limited networking modes change this — they control what the container can access on the public internet, not how to tunnel into your VPC.

There are three approaches, each with different tradeoffs:

Custom Tools — Your App as the Bridge (Recommended)

This is the pattern Anthropic's own engineering team describes. Your application, which does sit on your internal network, acts as a proxy between the agent and your private resources:

`python

Your app runs on your internal network and handles tool calls

for event in stream:

if event.type == "agent.custom_tool_use":

if event.name == "query_internal_db":

# YOUR app calls YOUR internal API — agent never touches it

result = your_internal_api.query(event.input["query"])

# Send only the result back to the agent

client.beta.sessions.events.send(session_id, events=[{

"type": "user.custom_tool_result",

"tool_use_id": event.id,

"content": [{"type": "text", "text": json.dumps(result)}],

}])

The agent emits a structured tool request. Your app receives it via SSE, executes the actual API call on your internal network, and sends back only the data the agent needs. Credentials and network access never leave your infrastructure.

MCP Proxy Pattern

Anthropic's architecture supports an MCP-based proxy where the agent calls tools through a dedicated proxy that fetches credentials from a vault and bridges into your network. Same principle as custom tools, but using the MCP protocol and integrated credential management.

VPC Peering (Enterprise)

Anthropic decoupled the agent "brain" (harness) from the "hands" (sandbox) specifically to enable more flexible network topologies. VPC peering is an option for enterprise customers, though it requires direct coordination with Anthropic's team rather than self-service configuration.

The tradeoff with custom tools is latency — every internal API call adds a round-trip through the SSE stream. For most use cases (database queries, internal service calls) this is negligible. For high-frequency tool calls, batch your data retrieval into fewer, larger tool responses.

!Engineers reviewing a dashboard showing agent sessions, model routing decisions, and code review results flowing through the platform

What This Means for Agent Architecture

The conventional approach to building production agents involves assembling multiple open-source tools: LangChain or CrewAI for the agent loop, Docker for sandboxing, custom code for model routing, and whatever linting/review tools you can configure.

Anthropic's stack doesn't replace all of that — but it removes the undifferentiated layers. You don't build the agent loop (Managed Agents handles it). You don't build model routing logic (the advisor strategy handles it). You don't build review automation (Code Review handles it).

What's left is the work that actually differentiates your product: the system prompts, the tool definitions, the business logic, and the domain expertise encoded in your agent's instructions.

The multi-agent pattern takes this further. Instead of one monolithic agent doing everything, you create specialized agents — a reviewer, a test writer, a documentation updater — and compose them through a coordinator. Each agent has a focused system prompt and limited tools, which improves both quality and auditability.

The Bottom Line

The gap between teams building agents in production and teams still experimenting is increasingly about infrastructure, not models. Same Sonnet. Same Opus. Wildly different outcomes based on how the runtime, routing, and quality gates are configured.

Managed Agents removes the need to build your own execution environment. The advisor strategy removes the need to choose between intelligence and cost. Automated code review removes the need for humans to catch what agents should catch themselves.

Three layers. One platform. The undifferentiated infrastructure work just got a lot smaller.

Related: Smart Agent Model Routing covers model routing strategies in depth, including the three-tier approach that cuts API spend by 70-90%.