Build a Multi-Agent System

Some tasks are too complex for a single agent. A research task might need five sources investigated simultaneously. A coding task might need file search, code generation, and test execution happening in parallel. The instinct is to build agents as equal peers that split the work, a flat team where each agent takes a slice. That model fails in practice.

The failure mode is predictable: without a coordinator, agents duplicate work, produce conflicting outputs, and leave no one responsible for assembling a coherent answer. You end up with five partial answers that the user has to reconcile themselves.

Production multi-agent systems use delegation, not distribution. One coordinator decides WHAT needs doing. Specialized workers decide HOW to do their assigned piece. The coordinator synthesizes the results into a coherent final answer. This guide walks through building that pattern with a concrete example: a coordinator that delegates to a research agent and a writing agent.

When to Split

Not every task needs multiple agents. A single agent with the right tools handles most workflows. Split only when you can name distinct responsibilities that require different tool sets.

The heuristic: if two subtasks need different tools and can run independently, they are candidates for separate agents. A research subtask needs search and web browsing tools. A writing subtask needs file write and formatting tools. Giving both tool sets to a single agent works, but the agent's tool selection degrades as the tool count grows. More tools means more competition for the model's attention in the schema.

Do not split when:

The subtasks are sequential (each depends on the previous result). A single agent with a plan handles this better than a coordinator waiting on one worker at a time.
The subtasks share state heavily. If worker A needs to read what worker B just wrote, you need synchronization, and synchronization between agents is expensive and error-prone.
You are optimizing prematurely. Start with one agent. Measure where it struggles. Split the specific capability that is causing problems.

Tip: Start with one agent. Split only when you can name distinct responsibilities that require different tool sets. Premature splitting adds coordination overhead without adding capability.

The Coordinator Pattern

The coordinator receives the user's task, plans the subtasks, delegates to workers, waits for results, and synthesizes a final answer. It does not execute subtasks directly. It only plans and synthesizes.

Here is the coordinator loop:

async function coordinator(task: str, worker_configs: list) -> str:
  # Step 1: Plan - break the task into subtasks
  plan = await llm.call(
    messages=[
      system_message("You are a coordinator. Break this task into independent subtasks."),
      user_message(task)
    ],
    tools=[plan_subtasks_tool]
  )

  # Step 2: Delegate - spawn workers in parallel
  worker_tasks = []
  for subtask in plan.subtasks:
    worker = spawn_worker(
      task=subtask.description,
      context=subtask.required_context,
      tools=subtask.required_tools
    )
    worker_tasks.append(worker)

  # Step 3: Gather - wait for all workers
  results = await gather_all(worker_tasks)

  # Step 4: Synthesize - combine results into a coherent answer
  synthesis = await llm.call(
    messages=[
      system_message("Synthesize these worker results into one coherent answer."),
      user_message(f"Original task: {task}"),
      *[assistant_message(f"Worker '{r.name}': {r.output}") for r in results]
    ]
  )

  return synthesis.text

Three design decisions in this structure are critical:

The coordinator does not have domain tools. It has only coordination tools: plan, spawn, and synthesize. If the coordinator had file search or web browsing tools, it would use them directly instead of delegating, and you would lose all the parallelism and specialization you designed for. Tool partitioning is what enforces the delegation model.

Workers run in parallel. The gather_all call dispatches all workers concurrently and waits for all to finish. This parallelism is the primary cost justification for multi-agent architecture. If your workers run sequentially, you have added coordination overhead without adding throughput.

Synthesis is an LLM call, not concatenation. The coordinator does not pass raw worker output to the user. It makes another LLM call that combines results, resolves conflicts, fills gaps, and produces an integrated answer. Skip the synthesis step and you have built an expensive router that makes the user do the assembly work.

Context Isolation

Each worker starts with a fresh message history. Workers cannot read each other's conversations. The coordinator sees only what workers explicitly return, not their full internal reasoning.

The following shows how a worker is initialized with isolated context:

async function spawn_worker(task: str, context: str, tools: list) -> WorkerResult:
  # Fresh message history, no parent conversation leaking in
  messages = [
    system_message(f"You are a specialist. Complete this task:\n{task}"),
    user_message(context)   # only the context this worker needs
  ]

  # Standard agent loop: each worker is a self-contained agent
  for turn in range(max_turns):
    response = await llm.call(messages, tools=tools)

    if response.tool_calls is empty:
      return WorkerResult(name=task, output=response.text)

    messages.append(response)
    for call in response.tool_calls:
      result = await dispatch_tool(call.name, call.args)
      messages.append(tool_result(call.id, result))

  return WorkerResult(name=task, output="Worker exceeded max turns")

Isolation is not a limitation. It is a design choice with four properties that make multi-agent systems tractable:

Prevents error cascades. A worker that encounters bad data or goes down a wrong reasoning path cannot infect other workers. Its error is contained within its own context. The coordinator sees a failed result and handles it without the error spreading.

Enables parallel execution. Workers share no mutable state. There is no shared message history to lock, no race condition on who appends first. Isolation is what makes gather_all safe to parallelize without synchronization.

Makes debugging tractable. Each worker's conversation is a self-contained artifact. When something goes wrong, you read one worker's message history in isolation and understand exactly what it saw and concluded.

Provides security isolation. A worker processing untrusted input (user-uploaded documents, scraped web content) cannot leak that content into the coordinator's context or into other workers. If the content contains a prompt injection, it affects only that worker's narrow task.

Tool Partitioning

The coordinator and workers have different tools. This partition is what makes delegation work. If the coordinator has all the tools, it will use them directly instead of delegating.

The following shows how tools are assigned:

# Coordinator tools: only coordination capabilities
coordinator_tools = [
  plan_subtasks_tool,    # break a task into subtasks
  spawn_worker_tool,     # create a worker agent
  send_message_tool,     # communicate with an active worker
  terminate_worker_tool  # stop a worker that is stuck
]

# Research worker tools: only research capabilities
research_tools = [
  web_search_tool,
  fetch_url_tool,
  extract_text_tool
]

# Writing worker tools: only writing capabilities
writing_tools = [
  write_file_tool,
  read_file_tool,
  format_document_tool
]

The partition follows a simple rule: each agent gets only the tools it needs for its specific responsibility. The coordinator needs to plan and delegate, so it gets coordination tools. The research worker needs to find information, so it gets search tools. The writing worker needs to produce documents, so it gets file tools.

If you find that a worker needs a tool from another worker's set, that is a signal that either the subtask boundaries are wrong or you need a third worker. Do not share tool sets across workers. It defeats the purpose of specialization.

Handle Partial Failure

Not every worker will succeed. A research worker might fail to find a source. A writing worker might exceed its turn budget. The coordinator must decide what to do with partial results.

The following shows a coordinator that handles worker failures:

async function gather_with_fallback(worker_tasks: list) -> list:
  results = []
  for task in worker_tasks:
    try:
      result = await task
      results.append(result)
    except WorkerError as error:
      results.append(WorkerResult(
        name=task.name,
        output=None,
        error=str(error)
      ))

  # Separate successes from failures
  successes = [r for r in results if r.output is not None]
  failures = [r for r in results if r.output is None]

  if len(successes) == 0:
    raise AllWorkersFailedError(failures)

  if failures:
    log_warning(f"{len(failures)} workers failed: {[f.name for f in failures]}")

  return successes

The decision of whether to proceed with partial results or fail entirely depends on the task. For research tasks where three of five sources are sufficient, partial success is fine. For tasks where every subtask is critical (all tests must pass), any failure should fail the whole operation. Make this decision explicit in your coordinator, not implicit in the error handling.

A Complete Example

Here is a coordinator that delegates a documentation task to a research agent and a writing agent:

# The coordinator receives "Write a technical summary of microservices patterns"
async function documentation_coordinator(task: str) -> str:
  # Plan: one research subtask, one writing subtask
  subtasks = [
    Subtask(
      name="research",
      description="Find key patterns, trade-offs, and production considerations for microservices.",
      tools=research_tools,
      context="Focus on service discovery, circuit breakers, and data consistency."
    ),
    Subtask(
      name="writing",
      description="Write a structured technical summary from the research findings.",
      tools=writing_tools,
      context=""   # will be populated with research results
    )
  ]

  # Phase 1: Research runs independently
  research_result = await spawn_worker(
    task=subtasks[0].description,
    context=subtasks[0].context,
    tools=subtasks[0].tools
  )

  # Phase 2: Writing depends on research (sequential, not parallel)
  writing_result = await spawn_worker(
    task=subtasks[1].description,
    context=research_result.output,   # research output becomes writing input
    tools=subtasks[1].tools
  )

  # Synthesize
  return await llm.synthesize(
    original_task=task,
    worker_results=[research_result, writing_result]
  )

This example shows a mixed pattern: the research worker runs first (independently), and the writing worker runs second (depends on research output). Not every multi-agent system is fully parallel. The key structural property is still present: each worker has isolated context, specialized tools, and a defined responsibility.

Multi-Agent Coordination. The full multi-agent architecture: spawning backends, file-based mailbox communication, session reconnection, progressive synthesis, and production trade-offs.
Tool System. How tool partitioning works at the dispatch level, and why giving the coordinator domain tools defeats the delegation model.
Agent Loop. Each worker runs its own agent loop. Understanding the loop is prerequisite to understanding how workers execute.