Home/Patterns/Core Systems/Multi-Agent Coordination

Multi-Agent Coordination

Why production multi-agent systems use delegation (not distribution): one coordinator decides what to do, specialist workers decide how. Covers spawning backends, file-based mailbox communication, session reconnection, and production trade-offs.

Some tasks are too complex for a single agent. A research task might need five sources investigated simultaneously. A coding task might need file search, code generation, and test execution to happen in parallel. The instinct is to build agents as equal peers that split the work: a flat team where each agent takes a slice. That model fails in practice.

The failure mode is predictable: without a coordinator, agents duplicate work, produce conflicting outputs, and leave no one responsible for assembling a coherent answer. You end up with five partial answers that the user has to reconcile themselves, or (worse) five agents calling each other in circles. Production multi-agent systems don't look like a group of equals. They look like a team with a lead.

Delegation, not distribution. The coordinator decides WHAT, workers decide HOW. Synthesis happens at the coordinator.

The coordinator agent receives the user's task, sees the full picture, and breaks it into subtasks. It does not execute those subtasks directly. It delegates them to worker agents. Workers each receive a narrow, well-defined task with the tools and context they need to complete it. The coordinator waits for results, then synthesizes them into a coherent final answer.

This is not a router pattern. A router sends the task to the most relevant worker and passes the worker's output back to the user unchanged. A coordinator synthesizes: it takes multiple partial results, resolves any conflicts, fills any gaps, and produces an integrated answer. The synthesis step is an LLM call, not a concatenation.

UserCoordinatorWorker 1Worker 2Parallel dispatchTaskPlan subtasksSubtask A (with isolated context + tools)Subtask B (with isolated context + tools)Result AResult BSynthesize (LLM call)Integrated answer

The diagram shows the key structural properties: parallel dispatch, isolated execution, and synthesis at the coordinator. The coordinator never touches the user again until synthesis is complete.

The Delegation Pattern

Here is the coordinator loop in pseudocode:

function coordinator_loop(task, available_workers):
  # Coordinator plans: it does not execute subtasks directly
  plan = await llm.plan(task)     # returns list of subtasks with tool and context requirements

  # Dispatch to workers in parallel: each gets isolated context and tools
  worker_tasks = []
  for subtask in plan.subtasks:
    worker = spawn_worker(
      task=subtask,
      context=subtask.required_context,  # only what this worker needs, not the full history
      tools=subtask.required_tools,      # only the tools for this domain
    )
    worker_tasks.append(worker)

  # Gather results (parallel execution: coordinator waits for all)
  worker_results = await gather_all(worker_tasks)

  # Coordinator synthesizes: never relays raw results
  return await llm.synthesize(
    original_task=task,
    worker_results=worker_results,
  )

Three design decisions embedded in this structure are worth making explicit.

Tool partitioning. The coordinator gets only coordination tools: spawn a worker, send a message, stop. It does not get file system access, API access, or search tools. Workers get the domain tools for their specific subtask. This partition prevents the coordinator from bypassing workers and doing the work directly, a failure mode that collapses the delegation model back to a single agent. If the coordinator can search files, it will search files instead of delegating, and you lose all the parallelism and specialization you designed for.

Synthesis, not relay. The coordinator doesn't pass raw worker output to the user. After all workers return, it makes another LLM call to synthesize, combining results, resolving conflicts, filling gaps, and producing an integrated answer. The synthesis step is what distinguishes a coordinator from a router. Skip it and you've built an expensive router that makes the user do the assembly work. Include it and the user gets one coherent answer regardless of how many workers contributed to it.

Parallel execution. Workers run concurrently via async gather. The coordinator dispatches all workers and waits for all to finish (or handles partial failure, as discussed below). The parallelism is the primary cost justification for multi-agent architecture. If your workers run sequentially, you've added coordination overhead without adding throughput.

Context Isolation

Each worker agent starts with a fresh message history, either empty or initialized with only the context it needs for its specific subtask. Workers cannot read each other's message histories. The coordinator sees only what workers explicitly return, not their full internal conversation.

This isolation is a design choice, not a limitation. It has four properties that make multi-agent systems tractable at scale:

Prevents error cascades. A worker that gets confused by a malformed tool result, encounters unexpected data, or goes down a wrong reasoning path cannot infect other workers. Its error is contained to its own context window. The coordinator sees a failed result and can handle it (retry, skip, escalate) without the error spreading.

Enables parallel execution. Workers share no mutable state. There is no shared message history to lock, no coordination overhead between workers, no race condition on who appends to the conversation first. Isolation is what makes the gather_all in the coordinator safe to parallelize without synchronization.

Makes debugging tractable. Each worker's conversation is a self-contained artifact. When something goes wrong, you can read a single worker's message history in isolation and understand exactly what it saw, what it concluded, and why. Without isolation, debugging a failure means untangling one long conversation where multiple agents interleaved their reasoning.

Security isolation. A worker that processes untrusted input (user-uploaded documents, external API responses, scraped web content) cannot leak that content into the coordinator's decision-making or into other workers. The worker's context is quarantined. If the untrusted content attempts a prompt injection attack, it affects only that worker's narrow task, not the coordinator's plan or other workers' execution.

Spawning Strategies

How the coordinator dispatches workers depends on task structure and tolerance for partial failure.

Synchronous spawning. The coordinator spawns one worker, waits for it to finish, then spawns the next. Simple and easy to reason about, but loses all parallelism. Use when tasks are sequential (each subtask depends on the previous worker's output) or when you're debugging and want to inspect each worker's result before proceeding.

Async gather. All workers are spawned at once, and the coordinator waits for all to finish before synthesizing. Parallel, and the right default for independent subtasks. The cost: you must decide what to do when one worker fails. Cancel the remaining workers and fail the whole task? Wait for successful workers and synthesize from partial results? That decision should be explicit in your coordinator design, not left as an implicit crash.

Async with progressive synthesis. The coordinator processes results as they arrive, decides when it has enough to synthesize, and optionally cancels remaining workers. Best throughput for tasks where partial results are useful (research tasks where 3 of 5 sources are sufficient), but most complex to implement. Requires the coordinator to make an explicit "do I have enough?" decision at each result.

Spawning Backends

Workers don't run in just one environment. A production coordination system needs to work whether the agent is running in a script, a desktop terminal with multiplexer support, or a native split-pane terminal. Three runtime environments exist, all sharing the same executor interface so the coordinator doesn't need to know which one it's using.

In-process executor. The worker runs in the same process as the coordinator, isolated via async-local storage rather than process boundaries. The worker shares the coordinator's API client and MCP connections (no startup cost for re-establishing those) but has fully isolated message history. Use when external dependencies are unavailable or when you want minimal spawning overhead.

Multiplexer-based executor. The worker runs in a separate process, launched inside a terminal multiplexer pane (color-coded borders help distinguish agents visually). The new process is fully independent, so crashes don't affect the coordinator process. The cost is the time of a new process startup and MCP reconnection.

Native terminal executor. The worker runs in a separate process, launched in a native terminal split pane. Functionally equivalent to the multiplexer-based executor from a coordination perspective (same process isolation, same communication model), but uses the terminal's native split pane API rather than the multiplexer command interface.

The system selects the executor at runtime without any configuration in the coordinator:

type TeammateExecutor = {
  spawn(config: SpawnConfig) -> SpawnResult
  send_message(agent_id: str, message: Message) -> void
  terminate(agent_id: str, reason?: str) -> bool
  is_active(agent_id: str) -> bool
}

# Backend selection at runtime: same interface regardless of backend
function get_executor() -> TeammateExecutor:
  if inside_native_terminal() and native_pane_available():
    return native_terminal_executor()
  if inside_multiplexer() or multiplexer_available():
    return multiplexer_executor()
  return in_process_executor()   # always available, no external dependencies

The key insight: in-process workers share resources (API client, MCP connections) but have isolated message history. The isolation is at the conversation level, not the resource level. This means in-process workers don't pay re-connection costs, but they also can't contaminate each other's reasoning. For fast, resource-light scenarios the in-process executor is the right default. For long-running, crash-tolerant workers the process-isolated executors are better.

Mailbox Communication

All three executor backends use the same communication channel: a file-based mailbox. Not shared memory. Not pipes. Not sockets. Files, even for in-process workers.

This design choice might seem surprising. Why add file I/O when in-process workers could use a channel directly? The answer is uniformity: the coordination code that sends a message to a worker doesn't need to know whether that worker is in-process or running in a separate terminal. The same send call works for all three. It also means message history is inspectable on disk at any moment, a trivial but decisive debugging advantage.

The mailbox is a directory of message files. Each file represents one message. Reading is atomic: the reader acquires a lockfile, reads the message file, deletes it, and releases the lock. This is not a queue. There is no ordering guarantee beyond filesystem modification time, and messages are consumed exactly once.

# Agent ID format: agentName@teamName
# Example: researcher@my-team, tester@my-team
# Deterministic, human-readable, and grep-able in logs

function send_message(to: str, message: Message) -> void:
  # to is an agent ID: "researcher@my-team"
  mailbox_dir = get_mailbox_path(to)
  message_file = mailbox_dir / f"{uuid()}.msg"
  message_file.write_atomic(message.serialize())

function receive_messages(agent_id: str) -> list[Message]:
  mailbox_dir = get_mailbox_path(agent_id)
  messages = []
  for msg_file in mailbox_dir.list_files():
    with lockfile(msg_file):
      if msg_file.exists():       # check again under lock
        messages.append(Message.deserialize(msg_file.read()))
        msg_file.delete()
  return messages

The agent ID format (agentName@teamName) is deliberate. It's deterministic and human-readable. researcher@my-team is immediately understandable in a log file. A UUID would be correct but opaque. When debugging a multi-agent system, being able to grep for researcher@my-team and see everything that agent sent and received is the difference between a 5-minute and a 50-minute debugging session.

The trade-off: file I/O introduces latency. A coordinator polling at 1-second intervals will have 0 to 1 second message delay. That's acceptable for multi-step coordination tasks (research, code generation, analysis) but makes the mailbox pattern unsuitable for tight feedback loops where sub-100ms coordination is required.

One ordering rule matters: register your message callback before sending the message. If you send first and then register, there is a race condition. The sender may respond before the receiver has registered, and the response will be missed.

Session Reconnection

Workers can crash. Processes get killed. Users close terminals. A robust multi-agent system handles this without losing team membership.

Workers persist their team identity in the session transcript: their agent name, team name, and the ID of the leader agent. On session resume, the system reads the team file on disk, recovers the leader's identity, and re-registers the agent. A crashed worker that resumes from its transcript rejoins the team without any manual intervention and without the coordinator having to reschedule its subtask.

function initialize_worker_from_session(session: Session) -> TeamContext:
  # Worker stores team membership in its own session transcript
  team_entry = session.find_entry(type="team-membership")
  if team_entry is None:
    return None   # not a team worker

  # Recover team state from the team file (coordinator writes this)
  team_file = get_team_file(team_entry.team_name)
  team_state = team_file.read()

  # Re-register with the team
  register_agent(
    agent_id=team_entry.agent_id,     # "researcher@my-team"
    leader_id=team_state.leader_id,   # recovered from team file
  )

  return TeamContext(
    agent_id=team_entry.agent_id,
    leader_id=team_state.leader_id,
    team_name=team_entry.team_name,
  )

The team file is the source of truth for who the leader is. Workers don't store the leader's ID only in memory. They write it to the team file so it survives restarts. The session transcript records team membership so workers know they're a team member when they resume. Together, these two persistence mechanisms make team membership durable across crashes.

Error Recovery in Multi-Agent Systems

When a worker fails, the coordinator has the same escalation options as any agent facing a tool failure: retry the worker, fall back to a different worker or strategy, degrade by proceeding without that worker's contribution, or fail the entire task. The tiered recovery ladder (retry, fallback, degrade, fail) applies at the coordination level, not just at the individual tool level. A coordinator that swallows worker failures silently will produce confident-sounding but incomplete synthesis. Make the failure handling explicit. See Error Recovery and Resilience for the full escalation ladder pattern.

Streaming Through the Coordination Layer

The event pipeline from Streaming and Events extends naturally through multi-agent systems. A supervisor can observe worker event streams in real-time, subscribing to a worker's event stream the same way a UI subscribes to the agent loop. The coordinator can forward relevant events (tool dispatches, intermediate text) to the user's UI before synthesis is complete, enabling progressive disclosure even in multi-agent workflows. The event model makes this composable: no coupling between the coordinator's forwarding logic and the specific consumers watching the stream.

Production Considerations

In-process workers share resources but have isolated conversation state. The isolation boundary matters. When a worker is launched in-process, it shares the parent's API client and MCP connections. It does not pay the cost of re-establishing those connections. But it has its own message history. The coordinator cannot read it, and other workers cannot read it. This means in-process workers are fast to start but not fully isolated from resource exhaustion: if one worker drives up API usage, the shared client's rate limits affect all workers. Separate-process workers have independent clients and don't share rate limit state.

File-based mailboxes make debugging trivial but introduce latency. A mailbox is a directory of message files that requires disk I/O to read. A coordinator polling at 1-second intervals will have 0 to 1 second message latency regardless of local execution speed. This is acceptable for multi-step coordination tasks (most coordination decisions aren't time-critical), but it makes the mailbox model unsuitable for tight feedback loops requiring sub-100ms response times.

The leader's permission UI shows a pending indicator, but the worker's execution is paused. When a worker needs permission for a tool and forwards the request to the leader via mailbox, the worker's execution is blocked at that point. The leader sees a "pending worker request" indicator in its UI and can approve or deny. If the leader's session ends before responding (the user closes the terminal), the worker's abort signal fires and it resolves with a cancel decision, preventing a hung worker. This graceful abort is the mechanism that keeps workers from blocking indefinitely when the leader disappears.

Agent IDs should be human-readable and deterministic. Using researcher@my-team instead of a UUID makes log correlation trivial. When debugging a multi-agent workflow, being able to grep for a specific agent name across log files is far faster than correlating UUID fragments. The name@team format encodes both the agent's role and its team membership in a single inspectable string.

Register callbacks before sending mailbox messages, not after. If a coordinator sends a message and then registers a handler for the response, there is a window where the response could arrive before the handler is registered. The handler misses it. The coordinator waits indefinitely. Register first, send second. Always.

Synthesis at the coordinator requires sufficient context. When you use async gather with progressive synthesis (process results as they arrive), the coordinator must have enough context from each worker's result to synthesize coherently. If workers return only partial outputs ("here's part of what I found" without the full artifact), synthesis quality degrades. Design worker return contracts explicitly: what minimum information must a worker return for the coordinator to synthesize without that worker's full context?

Executor backend selection is transparent to the coordinator, but failure modes differ. An in-process worker that panics can corrupt shared process state. A separate-process worker that crashes leaves the coordinator's process entirely unaffected. When you're spawning workers that run untrusted or unpredictable code, the process boundary offered by multiplexer-based or native terminal executors is a real safety property, not just a UI feature. For trusted, predictable subtasks, the in-process executor's lower latency is the right choice.

Best Practices

Do: use tool partitioning to prevent coordinator bypass. Give the coordinator only coordination tools (spawn, send, stop). Don't give it the domain tools it delegates. A coordinator with file access will use file access directly instead of delegating.

Don't: let the coordinator relay raw worker results to the user. The coordinator synthesizes. It makes an LLM call to combine, resolve conflicts, and produce an integrated answer. If you pass worker outputs directly to the user, you've built an expensive router that makes the user do the assembly work.

Do: use file-based mailboxes for inter-agent communication. File-based mailboxes work across all executor types (in-process, multiplexer, native terminal) with the same interface. They make message history inspectable on disk. Don't: use shared memory, queues, or pipes. They fail silently across process boundaries.

Do: use human-readable agent IDs in name@team format. Don't: use UUIDs or auto-generated identifiers. Debugging a multi-agent system with opaque agent IDs is significantly harder than with human-readable ones.

Do: register response callbacks before sending mailbox messages. Don't: send first, then register. That's a race condition waiting for production load to trigger it.

Do: make partial failure handling explicit. Decide before deployment whether a coordinator should cancel all workers on one failure, synthesize from partial results, or wait and retry. Don't: let partial failure handling be an implicit crash that produces a silent empty synthesis.

Do: persist team membership (agent ID, team name, leader ID) in the session transcript. Don't: store team identity only in memory. A restarted worker needs to rejoin without manual intervention, and the transcript is the durable record.

Do: prefer the in-process executor for fast, trusted subtasks. Prefer process-isolated executors for long-running or untrusted workers. The in-process executor's shared API client and zero startup latency make it ideal when spawning overhead matters. Process isolation's crash containment and independent resource limits make it the right choice when workers run unpredictable or long-running operations. Choose deliberately. Don't let the default decide for you.

  • Agent Loop Architecture: The single-agent loop that each worker runs. Every worker in a multi-agent system is itself an agent loop, the same two-state machine, with its own message history and tool dispatch cycle. Understanding the loop makes worker execution predictable: a worker terminates when its model response contains no tool calls, just like any single agent.

  • Streaming and Events: Event propagation through the coordination layer. Worker agents yield typed events the same way a single agent loop does. A coordinator or supervisor can subscribe to those event streams to observe worker progress in real-time, forward events to a UI, or detect failures before synthesis begins. The event model makes multi-agent observability composable.

  • Safety and Permissions: Worker permission forwarding and context isolation as a safety property. Workers running in isolated contexts cannot show user dialogs directly. They forward permission requests to the leader via the mailbox protocol. The permissions chapter covers how to configure per-worker trust levels and how the mailbox-based forwarding protocol protects against unattended workers prompting users from the wrong context.

  • Hooks and Extensions: Team lifecycle hooks (SessionStart, SessionStop, and the 27+ hook events organized by lifecycle phase) give coordinators and workers extensibility without modifying the core coordination protocol. Hook-based audit and observability compose naturally with the mailbox communication pattern.

  • Pattern Index: All patterns from this page in one searchable list, with context tags and links back to the originating section.

  • Glossary: Definitions for all domain terms used on this page, from agent loop primitives to memory system concepts.