A chatbot responds once. An agent loops until the task is done. That one difference, a loop, is what makes an LLM capable of reading files, calling APIs, running computations, and delivering answers grounded in real information rather than training knowledge alone.
The core idea: each turn, the model receives the full conversation history and decides what to do next. If it needs more information, it calls a tool. The result comes back, gets appended to the history, and the model decides again. When the model has enough to answer, it responds without calling any tool, and that's the signal to stop.
We'll build this in two steps: first the bare loop, then one tool wired in. By the end, you'll have a clear mental model of how every agent system works, and a foundation for the depth that follows.
Step 1: The Loop
The agent loop is a function that takes a question and returns an answer. Here's the complete structure:
function agent_loop(question, max_turns = 20):
messages = [
system_message("You are a helpful assistant."),
user_message(question)
]
for turn in range(max_turns):
response = await llm.call(messages)
if response.tool_calls is empty:
return response.text
messages.append(response)
for call in response.tool_calls:
result = await dispatch_tool(call.name, call.args)
messages.append(tool_result(call.id, result))
raise RuntimeError("agent exceeded max_turns without answering")Three things make this work:
The messages list is the agent's memory. The model sees the entire list on every turn: the original question, all previous responses, all tool results. This is how it maintains context across multiple steps. The list grows as the loop runs. A simple task might finish in two turns, while a complex one might take a dozen.
Termination is the absence of tool calls. When the model decides it has enough information to answer, it produces a response with no tool_calls. That empty field is the signal: return the answer and stop. The agent doesn't announce "I'm done." It simply stops asking for tools.
max_turns is a safety net, not a budget. A well-designed agent should terminate naturally long before hitting 20 turns. Exceeding max_turns is an error condition. The model didn't converge, and you should surface that failure explicitly rather than returning a partial or empty answer. Without this guard, a confused model runs forever.
Step 2: Add a Tool
Tools are how the agent acts on the world. A tool is a function plus metadata: a name, a description, and a parameter schema. The LLM never sees your function code. It only sees the schema, which tells it what arguments to pass.
Here's a read_file tool:
tools = [
{
name: "read_file",
description: "Read the contents of a file at the given path",
parameters: {
path: { type: "string", description: "Absolute path to the file" }
}
}
]
# The implementation is separate. The LLM never sees this.
function read_file_impl(path):
return filesystem.read(path)The metadata (the name, description, and parameter types) is what the model reasons about when deciding whether to call this tool and what arguments to pass. Your function body is invisible. This is the key insight: the schema is the interface, the implementation is the plumbing.
Wiring the tool into the loop takes two changes:
response = await llm.call(messages, tools=tools) # pass tools hereAnd a dispatch function to route calls to implementations:
function dispatch_tool(name, args):
if name == "read_file":
return read_file_impl(args.path)
raise Error(f"unknown tool: {name}")That's it. The loop from Step 1 is unchanged. The model now knows read_file exists, calls it when it needs file contents, and the dispatcher runs the real function. In a production system, the dispatcher becomes a registry that maps names to implementations automatically. See Tool System for the full pattern.
Note: As the agent reads more files and accumulates tool results, the messages list grows. For long-running tasks or large files, this becomes a context management problem. See Memory and Context for compaction strategies.
Where to Go From Here
Every concept this page introduces has a deeper Core Systems page with production-grade detail, failure modes, and non-obvious insights.
- Agent Loop Architecture. The full lifecycle: async generator patterns for streaming, three distinct abort paths, token budget termination, and why loop cleanup order matters under cancellation.
- Tool System. Registration, dispatch, concurrency classes (so read-only tools run in parallel), behavioral flags, schema flattening for LLM APIs, and fail-closed defaults.
- Memory and Context. What happens when the messages list grows too long: compaction strategies, LLM-driven fact extraction, session memory budgets, and context pruning triggers.
- Prompt Architecture. How to structure the system message for cache efficiency, multi-section composition, volatile section registration, and behavioral calibration.
- Error Recovery. When the LLM call fails, when a tool crashes, and the four-rung escalation ladder: retry, fallback, partial result, escalate.
- Safety and Permissions. Controlling what tools can do: the six-source permission cascade, graduated trust levels, and bypass-immune safety checks that hold even in auto mode.
- Multi-Agent Coordination. When one loop isn't enough: spawning backends, file-based mailbox communication, tool partitioning between coordinator and workers, and session reconnection for resumed agents.
- Streaming and Events. Delivering results as they happen: typed event streams, priority-based dispatch, capture/bubble phases, and screen-diffing output models.
- Pattern Index. All patterns mentioned in this quickstart in one searchable list, with links to the deeper Core Systems pages where each pattern is explained.
- Glossary. Definitions for every term introduced on this page, from agent loop to tool dispatch.