OpenAI Codex is a cloud-based async coding agent that reads your codebase, opens PRs, and runs in isolated sandboxes — without you watching. Tasks complete in 1-30 minutes. Inside OpenAI, engineers now merge 70% more PRs weekly. Here's how the workflow actually works, and how to set it up.
OpenAI Codex completes tasks in 1 to 30 minutes.
That's the runtime window for a cloud-based coding agent that reads your codebase, writes the feature, runs the tests, and opens a pull request — without you watching a cursor blink in a terminal.
Inside OpenAI, nearly all engineers now use Codex. They're merging 70% more pull requests weekly. Duolingo's engineering team saw a 70% increase in PR volume and a 67% reduction in median code review turnaround time.
These aren't benchmark numbers. These are production metrics from teams that rewired their development workflow around async agent execution.
The shift from AI-as-autocomplete to AI-as-async-agent is real, and the tooling is now mature enough to use in production. What it requires isn't a new tech stack. It's a different mental model for how software gets made.
The Fundamental Difference
Most AI coding tools are synchronous. You write a prompt, the model responds, you review, you iterate. Claude Code works this way. GitHub Copilot works this way. The AI is fast, but you're still in the loop for every step.
Codex is designed to operate differently.
You define a task — fix this bug, implement this feature, add tests for this module — and Codex creates a cloud sandbox, clones your repository into it, executes the work, and returns a pull request. The task runs in its own isolated environment. The network is disabled once the agent phase begins, which prevents the generated code from reaching external services or downloading unintended packages. Every task is hermetically sealed.
The practical result: you can delegate ten tasks at once. While you're in a planning meeting, Codex is writing code in parallel across ten isolated environments. You come back to ten pull requests waiting for review.
That's not a hypothetical. That's the workflow OpenAI's own engineers use daily.
What Gets Delegated vs. What Gets Kept
The async model shifts the engineering role rather than replacing it.
The tasks that work well with Codex are well-defined, bounded, and testable. Bug fixes with clear reproduction steps. Feature additions with existing test patterns to follow. Refactors with a specified target structure. Documentation generation. Test coverage expansion.
The tasks that still require human judgment are architectural decisions, product tradeoffs, requirements disambiguation, and anything where the spec is genuinely unclear. Agents don't invent product intent. They execute it.
This is the same division that StrongDM formalized with their Software Factory: humans write specifications, agents do everything else. Three engineers, 32,000 lines of production code, zero written by human hands. The test harness — not a human reviewer — is the quality gate.
Codex isn't StrongDM's setup. But the underlying principle is identical.
How the Workflow Works
Codex is available across ChatGPT Plus, Pro, Business, Enterprise, and Edu plans. The agent connects to your GitHub account and runs tasks against connected repositories.
The configuration file is AGENTS.md — Codex's equivalent of CLAUDE.md. It lives in your repository root and tells the agent how to navigate your codebase, which commands to run for testing, and what constraints to respect.
A minimal AGENTS.md for a Node.js project:
``markdown
Agent Instructions
Environment
• Node.js 20, pnpm
Testing
• Run tests with: pnpm test
• Always run tests before proposing a PR
Code Style
• Follow existing patterns in src/
• No new dependencies without mentioning them in the PR description
PR Requirements
• PRs must pass all tests
• Include a summary of what changed and why
`
The Slack integration changes the delegation surface significantly. Once you install the Codex Slack app and connect it to your workspace, you can mention @Codex in any channel or thread with a task description. Codex creates a cloud task, reacts with 👀 to acknowledge, and replies with a link to the task when it completes.
This means a product manager can file a bug report in Slack and tag @Codex in the same message. No Jira ticket. No developer context switch. The agent picks it up, fixes it, and posts the PR link back in the thread.
Try It Yourself
Write an effective AGENTS.md
`markdown
Agent Instructions
Repository Context
• This is a [language/framework] application
• Main entry point: [file path]
• Key directories: src/ (application code), tests/ (test suite)
Running the Project
• Install: [package manager install command]
• Build: [build command]
• Test: [test command] — always run before completing any task
Coding Standards
• Match existing code style in each file you edit
• Prefer editing existing files over creating new ones
• Document any new public functions
Pull Request Guidelines
• Write a clear description of what changed
• Reference any files you created or deleted
• Flag any assumptions you made about requirements
`
Place this file in your repository root. Codex reads it before starting any task. The more specific the instructions, the more predictable the output.
Delegate a scoped task from the command line
If you have the Codex CLI installed (via ChatGPT API access), you can kick off tasks directly:
`bash
Submit a task to Codex and get a PR link back
codex task create \
--repo your-org/your-repo \
--title "Add input validation to user registration endpoint" \
--description "The /api/register endpoint accepts emails without format validation. Add validation that returns a 400 with a clear error message for invalid email formats. Follow the pattern used in /api/login."
List active tasks
codex task list --repo your-org/your-repo
Check status of a specific task
codex task status
The --description field is where precision matters most. Include reproduction steps for bugs, reference existing patterns for features, and specify the expected behavior change. Vague descriptions produce vague PRs.
Measure your PR review load before and after
Before introducing async agent delegation, measure your baseline. This shell command calculates average PR size over the last 30 merged PRs using the GitHub CLI:
`bash
Install gh if not already: brew install gh
Authenticate: gh auth login
Calculate average PR size (additions + deletions) for last 30 merged PRs
gh pr list --repo your-org/your-repo \
--state merged \
--limit 30 \
--json additions,deletions \
| python3 -c "
import json, sys
prs = json.load(sys.stdin)
sizes = [p['additions'] + p['deletions'] for p in prs]
print(f'PRs analyzed: {len(sizes)}')
print(f'Average PR size: {sum(sizes)//len(sizes)} lines changed')
print(f'Largest PR: {max(sizes)} lines')
print(f'Smallest PR: {min(sizes)} lines')
"
`
Run this before and after two weeks of Codex usage. Duolingo's 70% PR volume increase is a headline, but the review time reduction is the metric that matters for sustainable throughput. Larger absolute volume is only an improvement if review time doesn't scale proportionally.
Set up the Slack integration
In Codex settings at platform.openai.com, install the Slack app for your workspace. Once connected:
`
Syntax for delegating tasks from Slack
@Codex fix the null pointer exception in UserService.java, reproduce path in #bug-reports thread from March 30
@Codex add unit tests for the payment calculation module in src/billing/calculator.py, follow the pattern in existing test files
@Codex update the README to reflect the new environment variable requirements added in PR #847
`
The agent reads the thread context. Pasting the error stack trace directly into the Slack message gives Codex the reproduction context it needs without requiring a separate specification document.
The Review Problem Doesn't Disappear
One caution from the DORA 2025 data worth keeping in mind.
Across teams that adopted AI coding tools heavily, PR size increased 154% and code review time jumped 91%. The improvement in coding speed didn't translate to improvement in overall delivery speed because the review bottleneck grew faster than the throughput improvement.
Codex's async model can amplify this dynamic. Parallel task execution means more PRs, faster. If your review process is already strained, adding ten Codex PRs to the queue can make things worse before it makes them better.
The mitigation is the same one that works for human-written AI code: constrain PR scope at the task definition level. An AGENTS.md that enforces PR size limits and requires tests to pass before completion shifts the quality gate upstream rather than downstream.
`markdown
PR Requirements
• Target: ≤ 400 lines changed per PR
• Hard limit: 800 lines — split larger tasks into sequential PRs
• All tests must pass before marking the task complete
• Flag if the task requires changes across more than 3 files
``
Eighty-four percent of enterprise Codex deployments report improved successful build rates. The teams seeing that outcome are the ones that treated AGENTS.md as seriously as they treat code review checklists.
What Changes About the Engineering Role
The shift that matters most isn't the raw productivity number.
It's what happens to engineering attention when the execution layer gets delegated.
Teams running async agents at scale report that the cognitive texture of engineering work changes. Less time in implementation loops. More time in requirement clarity, architectural review, and PR quality gates. The work that was always the highest-leverage work — deciding what to build and why — becomes a larger fraction of the day.
Duolingo's 67% reduction in code review turnaround time wasn't achieved by reviewing less carefully. It was achieved by receiving better-scoped PRs from an agent that read the test patterns before writing code.
That's a different relationship with a development tool than what any autocomplete IDE extension delivers.
The async agent model is still early. The tooling has rough edges. AGENTS.md is a simple text file, not a policy enforcement system. Codex tasks still require human review before merging — and that's by design.
But the 1-to-30-minute build window, the Slack delegation surface, and the parallel task execution capability represent something qualitatively different from AI-assisted coding.
It's closer to AI-managed coding. And the gap between those two things is where the next productivity argument actually lives.