← Dashboard

AI Tooling · Handbook

Claude Code Agent Teams

Multi-agent orchestration — full handbook · 10 min read

Claude Code Agent Teams

Status: handbook (deep dive) Last updated: 2026-06-16 One-line verdict: Multiple independent Claude Code sessions that message each other and share a task list — powerful for parallel research/review/build, experimental, and markedly more expensive (≈7×) than a single session. Reach for it when workers need to collaborate, not just report back.

Companion to the Claude Ecosystem paper's §5.7. That page introduces the concept; this is the practical, every-detail guide.


Snapshot

What it is One lead Claude Code session spawns multiple teammate sessions that work in parallel, each in its own context window, communicating via a mailbox and coordinating through a shared task list.
Launched February 2026 (alongside Opus 4.6).
Status Experimental, disabled by default. Real limitations (below).
Requires Claude Code v2.1.32+; Pro / Max / Team / Enterprise plan.
Enable CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 (env or settings.json).
Cost 7× tokens vs a single session (scales ~linearly with teammate count).

The mental model

A single Claude Code session is one worker in one context window. Agent Teams turns that into a squad:

  • Team lead — the session you start in. It creates the team, breaks work into tasks, spawns teammates, assigns/synthesizes, and is the only one that manages the team.
  • Teammates — full, independent Claude Code sessions, each with its own context window. They don't share the lead's conversation history.
  • Shared task list — a backlog of work items with states (pending → in-progress → completed) and dependencies; teammates claim items.
  • Mailbox — teammates message each other directly (and the lead), delivered automatically.

The defining idea is that last point: peer-to-peer communication. That is the entire reason Agent Teams exists as something distinct from subagents.


Agent Teams vs Subagents vs worktrees — pick the right tool

Subagents Agent Teams Git worktrees
What Helper agents spawned inside one session Multiple independent sessions coordinating You run multiple Claude sessions manually
Context Own window; result returns to caller Own window; fully independent Separate checkouts/branches
Communication Report back to the main agent only Teammates message each other None (manual)
Coordination Main agent manages everything Shared task list + self-claiming You coordinate
Token cost Lower (results summarized back) Higher (≈7×; each is a full instance) N× whatever you run
Best for Focused tasks where only the result matters Work needing discussion, challenge, parallel ownership Isolated parallel branches you drive yourself

Rule of thumb: if your workers don't need to talk to each other, use subagents (cheaper, simpler). Use Agent Teams only when teammates genuinely need to share findings, challenge each other, or coordinate on a shared backlog. For sequential work, same-file edits, or heavy dependencies, a single session wins.


How it works (architecture)

Components: team lead, teammates, shared task list, mailbox.

Local storage (created automatically, removed on cleanup / session end):

  • Team config: ~/.claude/teams/{team-name}/config.json — runtime state (session IDs, tmux pane IDs, a members array of name/agent-id/agent-type). Don't hand-edit or pre-author it — it's overwritten on every state update.
  • Task list: ~/.claude/tasks/{team-name}/.
  • There is no project-level team config; a .claude/teams/… file in your repo is treated as an ordinary file, not configuration.

Task lifecycle: the lead creates tasks; teammates self-claim the next unblocked one or are assigned explicitly. Claiming uses file locking to avoid races. A task with unresolved dependencies can't be claimed until they complete; when a blocking task finishes, dependents unblock automatically.

Context each teammate loads on spawn: the same project context as a normal session — CLAUDE.md, MCP servers, skills — plus the spawn prompt from the lead. The lead's conversation history does not carry over, so the spawn prompt must contain the task-specific detail.

Communication: messages are delivered automatically (no polling); idle teammates notify the lead when they stop; all agents see task status; you address a teammate by name (one message per recipient — no broadcast).


Setup

1. Check your version:

claude --version          # need v2.1.32+

2. Enable the feature (env var, or persist it in settings):

// settings.json
{
  "env": { "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1" }
}

3. Choose a display mode (teammateMode in ~/.claude/settings.json):

  • in-process (default fallback) — all teammates in your main terminal; works in any terminal, no extra setup.
  • tmux / split panes — each teammate gets its own pane; needs tmux or iTerm2 (with the it2 CLI + Python API enabled).
  • auto (default) — split panes if you're in tmux/iTerm2, else in-process.
{ "teammateMode": "in-process" }

Or per-session: claude --teammate-mode in-process.

Split panes are not supported in VS Code's integrated terminal, Windows Terminal, or Ghostty. On those, use in-process mode.

4. Set the default teammate model in /configDefault teammate model. Teammates do not inherit the lead's /model by default. (Cost tip below: default them to Sonnet.)


Using it — the workflow

Starting a team

Just describe the task and structure in natural language; the lead creates the team, spawns teammates, and coordinates. Two entry points: you request a team, or Claude proposes one (you confirm — it never spawns a team without approval).

I'm designing a CLI that tracks TODO comments across a codebase. Create an
agent team to explore it from different angles: one teammate on UX, one on
technical architecture, one playing devil's advocate.

Specifying teammates and models

Create a team with 4 teammates to refactor these modules in parallel.
Use Sonnet for each teammate.

Plan-approval gate (for risky work)

Hold teammates in read-only plan mode until the lead approves their plan:

Spawn an architect teammate to refactor the auth module.
Require plan approval before they make any changes.

The lead approves/rejects autonomously — steer it with criteria in your prompt ("only approve plans with test coverage", "reject schema changes").

Talking to teammates directly

Each teammate is a full session you can message directly:

  • In-process: Shift+Down cycles teammates; type to message; Enter to view a session; Esc to interrupt; Ctrl+T toggles the task list.
  • Split panes: click into a teammate's pane.

Tasks: assign and claim

The lead can assign tasks explicitly ("give the auth task to the architect"), or teammates self-claim the next unassigned, unblocked task when they finish one. Dependencies resolve automatically.

Reusable roles via subagent definitions

Define a role once (project/user/plugin/CLI scope) and reuse it as a teammate:

Spawn a teammate using the security-reviewer agent type to audit src/auth/.

The teammate honors that definition's tools allowlist and model, and its body is appended to the system prompt. Important caveats:

  • The skills and mcpServers frontmatter fields are not applied to teammates — they load skills/MCP from project + user settings like a normal session.
  • Team tools (SendMessage, task-management tools) are always available even if tools restricts everything else.

Permissions

Teammates inherit the lead's permission mode at spawn (including --dangerously-skip-permissions). You can change an individual teammate's mode after spawning, but not set per-teammate modes at spawn time. Teammate permission prompts bubble up to the lead — pre-approve common ops in permission settings to cut friction.

Quality-gate hooks

Enforce rules deterministically (exit code 2 = block + send feedback):

  • TeammateIdle — fires when a teammate is about to go idle; block to keep it working (e.g. "tests aren't passing yet").
  • TaskCreated — block task creation with feedback.
  • TaskCompleted — block marking a task complete (e.g. require green tests).

Shutdown and cleanup

Ask the researcher teammate to shut down      # graceful; teammate may decline w/ reason
Clean up the team                             # removes shared resources

Always clean up via the lead (teammates may leave resources inconsistent). Cleanup fails if teammates are still running — shut them down first. The lead often cleans up on its own when work is done.


Use-case patterns

Parallel code review — distinct lenses, no overlap:

Create an agent team to review PR #142. Spawn three reviewers:
- one on security, one on performance, one validating test coverage.
Have each review and report findings.

Competing hypotheses (adversarial debugging) — the strongest pattern; beats a single agent's anchoring bias:

Users report the app exits after one message. Spawn 5 teammates to investigate
different hypotheses. Have them talk to each other to disprove each other's
theories, like a scientific debate. Update the findings doc with the consensus.

New modules / features — each teammate owns a separate file set. Cross-layer change — frontend / backend / tests, one teammate each.


Best practices

  • Start with research/review, not parallel implementation — clear boundaries, no file conflicts, immediate payoff.
  • 3–5 teammates for most work; ~5–6 tasks per teammate. Three focused teammates beat five scattered ones; returns diminish fast.
  • Avoid file conflicts — give each teammate a different set of files; two editing the same file overwrite each other.
  • Size tasks as self-contained units with a clear deliverable (a function, a test file, a review) — too small wastes coordination, too large risks long unattended drift.
  • Front-load context in the spawn prompt (they don't see the lead's history).
  • Monitor and steer; don't let a team run unattended. If the lead starts doing the work itself: "wait for your teammates to finish before proceeding."

Costs — read before you run a fleet

This is the headline trade-off. Official guidance: agent teams use ≈7× the tokens of a standard session (when teammates run in plan mode), because each teammate is a separate instance with its own context window; usage scales ~linearly with team size and runtime.

Practitioner field data (CloudZero, 2026 — treat as estimates):

  • No multi-agent discount — 10 agents burn quota ~10× faster. A ~$13/day solo dev becomes ~$130–260/day at 5–10 parallel agents.
  • Pro ($20/mo) is impractical — its 5-hour window drains in under an hour with ~5 agents. Max 20× ($200/mo) is the practical floor for regular use.
  • Tiered models save ~40% — an Opus lead + Sonnet teammates beats an all-Opus fleet (and a heavier model's tokenizer can emit ~35% more tokens).
  • Idle teammates still bleed tokens — clean up promptly.
  • Each inter-agent message is a billable round-trip through the model.
  • Stale context inflates cost 30–50% — clear between unrelated task batches.

Keeping it manageable (official): default teammates to Sonnet; keep teams small; keep spawn prompts focused (everything in them is context from turn one); clean up when done. Track with /usage; set a monthly cap with /usage-credits (Pro/Max) or workspace spend limits (API).


Benefits

  • True parallelism with collaboration — the only mode where workers share a backlog and challenge each other; surfaces better answers on review/debugging.
  • Independent ownership — teammates take separate files/modules cleanly.
  • Reuses existing primitives — subagent role definitions, hooks, CLAUDE.md, permissions all carry over; low conceptual overhead if you already use them.
  • Human-in-the-loop control — never spawns without approval; plan-approval gates and quality hooks keep risky work bounded.

Limitations / cons (it's experimental)

  • Cost — ~7×; the main reason to use it sparingly.
  • One team at a time; no nested teams (teammates can't spawn teammates); lead is fixed (no leadership transfer).
  • No session resumption for in-process teammates/resume and /rewind don't restore them; the lead may message teammates that no longer exist (spawn new ones).
  • Task status can lag — teammates sometimes fail to mark tasks done, blocking dependents; nudge or fix manually.
  • Shutdown can be slow (finishes the current tool call first).
  • Permissions only set at spawn; per-teammate modes only changeable after.
  • Split panes need tmux/iTerm2; unsupported in VS Code/Windows Terminal/Ghostty.

Troubleshooting

  • Teammates not appearing — in-process, press Shift+Down to cycle; or the task wasn't complex enough for the lead to spawn a team; for split panes verify which tmux / the it2 CLI.
  • Teammates stop on errors — view their output, give instructions, or spawn a replacement.
  • Lead shuts down early — tell it to keep going / wait for teammates.
  • Orphaned tmux sessionstmux ls then tmux kill-session -t <name>.
  • Too many permission prompts — pre-approve common ops before spawning.

How it fits us

  • Directly relevant — we're an AI-first shop (the core argument in react-vs-svelte.md); this is the heaviest-leverage Claude Code feature for parallel research and review of exactly the kind we do here.
  • Start with the cheap, safe patterns — parallel review and competing-hypothesis investigation on a PR or a bug. They show the value without the file-conflict risk of parallel implementation, and map perfectly onto code review and root-cause analysis in long-lived systems.
  • Budget consciously — ≈7× cost means it's a deliberate tool, not a default. Default teammates to Sonnet, keep teams at 3–5, and clean up. For most day-to-day work, a single session or subagents remain the right, cheaper choice.
  • Pairs with our practices — quality-gate hooks (TaskCompleted requiring green tests) and reusable security-reviewer/test-runner role definitions are how you'd make this auditable and repeatable for safety-adjacent work.

Sources