Running a frontier model on every step of an agentic task is expensive. Running a fast model on every step is cheap but leaves hard decisions underserved. The advisor strategy threads this needle: a capable executor model handles the task end-to-end, and a frontier advisor model enters only at the moments that require it.
The pattern inverts the usual hierarchical instinct. In a traditional multi-agent setup, a coordinator delegates to workers. Here, the executor handles everything directly — tool calls, result processing, iteration — and escalates to the advisor only when it hits a decision it cannot confidently resolve on its own. The advisor provides guidance. It never calls tools. It never generates user-facing output. When the consultation is complete, the executor continues with the advisor's guidance incorporated.
The key insight: frontier intelligence is most valuable at decision forks, not on routine steps. Most of what an agent does — reading files, running searches, formatting outputs — doesn't require the best model available. By targeting the advisor only at the moments that warrant it, you get frontier-level accuracy at a fraction of the cost of running a frontier model on every turn.
Architecture
The entire flow happens within a single API call. No extra round-trips, no orchestration layer, no additional infrastructure. The handoff to the advisor is internal to the model invocation.
Declaring the Advisor Tool
The advisor is declared as a tool in the API request body using the identifier advisor_20260301:
response = llm.create_message(
model=EXECUTOR_MODEL, # fast, cost-efficient executor (e.g. Sonnet or Haiku)
max_tokens=8096,
tools=[
{
"type": "advisor_20260301", # declares the advisor capability
},
# ... your other task tools
],
messages=[
{"role": "user", "content": task}
]
)The executor now has access to an advisor tool alongside its regular tools. When it encounters a decision it cannot resolve — an ambiguous error, a multi-path trade-off, a high-stakes choice — it can invoke the advisor tool to consult the frontier model before proceeding.
The max_uses Cap
A max_uses parameter limits how many times the advisor can be invoked per task:
response = llm.create_message(
model=EXECUTOR_MODEL,
max_tokens=8096,
tools=[
{
"type": "advisor_20260301",
"max_uses": 3, # advisor can be consulted at most 3 times per task
},
# ... other tools
],
messages=[{"role": "user", "content": task}]
)The max_uses cap serves two purposes:
Cost control. Advisor turns bill at frontier rates. Without a cap, a confused executor could consult the advisor on every step, eliminating the cost benefit. A cap forces the executor to reserve advisor invocations for genuine decision points.
Behavioral framing. Knowing the budget is limited, the executor treats the advisor as a scarce resource. It attempts to resolve ambiguities on its own before escalating. This produces a more capable executor behavior than one that can freely offload any uncertainty.
Typical advisor responses are short — 400–700 tokens per consultation — keeping advisor costs predictable even when the cap is reached.
Token Billing
The API reports executor and advisor tokens separately. This transparency is deliberate:
# Token usage is reported separately per model tier
usage = response.usage
log(f"Executor ({EXECUTOR_MODEL}) tokens: {usage.executor_input_tokens} in / {usage.executor_output_tokens} out")
log(f"Advisor ({ADVISOR_MODEL}) tokens: {usage.advisor_input_tokens} in / {usage.advisor_output_tokens} out")
# Executor turns bill at executor rates; advisor turns bill at frontier rates
# Track both independently to validate cost assumptions per task typeExecutor turns bill at executor rates. Advisor turns bill at frontier rates. There is no blended rate. You can track exactly what the advisor cost contributed to a task and optimize the max_uses cap accordingly.
Performance Benchmarks
The advisor strategy has been benchmarked on three task categories:
SWE-bench Multilingual (software engineering)
| Configuration | Score | Cost per task |
|---|---|---|
| Executor alone | baseline | baseline |
| Executor + frontier advisor | +2.7 pp | -11.9% |
The executor with a frontier advisor outperforms the executor alone while reducing cost per task. The advisor's targeted interventions resolve the ambiguous decisions that cause the executor to make suboptimal choices, without the overhead of running the frontier model on routine steps.
BrowseComp (web research)
| Configuration | Score | Cost vs mid-tier alone |
|---|---|---|
| Fast model alone | 19.7% | ~15% |
| Fast model + frontier advisor | 41.2% | ~15% |
| Mid-tier model alone | — | 100% (baseline) |
A fast model with a frontier advisor more than doubles the fast model's standalone score while costing 85% less than running a mid-tier model alone on the same tasks. This is the clearest demonstration of the strategy's cost-efficiency profile: the advisor elevates a fast, cheap model to well above what a mid-tier model can do on its own.
Terminal-Bench
Improvements are also observed on Terminal-Bench (terminal-based task completion), though the exact delta has not been publicly reported.
When the Executor Consults the Advisor
The executor decides when to invoke the advisor. Common patterns:
Ambiguous error diagnosis. The test output suggests multiple possible root causes. The executor has tried one fix and it didn't work. Instead of trying all combinations, it consults the advisor: "I see a null pointer exception in line 42, but the stack trace also suggests a race condition. Which should I investigate first?"
Multi-path architectural decisions. The task could be solved by refactoring the existing module or by introducing a new abstraction. The executor can complete either path but doesn't know which the user prefers or which is more consistent with the codebase's conventions.
High-stakes irreversible actions. Before deleting files, dropping a database table, or making a network request to an external service, the executor escalates to confirm the decision. This is especially valuable when the executor has been given broad tool permissions and needs a second opinion before acting destructively.
Novel problem domains. The executor encounters a pattern it has low confidence reasoning about — a less common programming language, an unusual API error code, a domain-specific constraint. The advisor, with higher overall capability, can reason more reliably about novel inputs.
The advisor does not decide when to be consulted. The executor decides. This is the critical structural difference from a hierarchical multi-agent setup where the coordinator assigns tasks.
Advisor Behavior
The advisor is constrained by design:
- No tool calls. The advisor cannot call tools. It can only read what the executor has already gathered and return guidance in text.
- No user-facing output. The advisor's response goes to the executor, not to the user. The user sees only the executor's final answer.
- Short responses. Advisor responses are guidance, not complete solutions. They steer the executor without taking over execution.
This constraint is what keeps the advisor economical. A 500-token advisor response that resolves a decision fork costs far less than running a frontier model for an entire multi-turn agentic task.
Comparison with Other Multi-Model Patterns
vs. Full frontier model execution. Running a frontier model end-to-end gives peak intelligence on every turn but at peak cost. The advisor strategy achieves near-frontier accuracy on hard tasks at substantially lower cost by targeting the frontier model where it matters.
vs. Coordinator/worker multi-agent. In the coordinator pattern, a top-level agent orchestrates multiple specialized workers. The advisor pattern has only one executor. There is no task decomposition, no parallel workers, and no synthesis step. The advisor supplements the executor's reasoning; it doesn't replace its execution role.
vs. Model routing. A router selects a model per request at the task level. The advisor strategy selects per decision point within a single task execution. Routing is coarse-grained (whole-task); the advisor is fine-grained (per-decision).
vs. Chain-of-thought prompting. CoT prompting asks the model to reason through steps before answering. The advisor strategy invokes a different, more capable model for specific reasoning steps. CoT improves the executor's own reasoning; the advisor introduces external, higher-quality reasoning.
Production Considerations
1. Set max_uses based on task complexity, not conservatism.
A max_uses of 1 may be sufficient for moderately complex tasks (the executor solves most steps independently and escalates once on the hardest decision). Tasks with multiple high-stakes branch points may need 3–5 uses. Measure advisor usage across real tasks before setting a hard limit — if the executor frequently hits the cap before completing, the cap is too low for the task profile.
2. The advisor cannot recover from bad tool results.
The advisor sees only what the executor passes to it. If the executor has received malformed tool results or accumulated bad state in its context, the advisor can only reason about what it's given. Design your executor's escalation logic to pass sufficient context: the question being asked, the relevant prior steps, and the specific decision fork.
3. Advisor latency adds to end-to-end task duration.
Each advisor consultation adds an additional model call's worth of latency. For latency-sensitive tasks, profile the advisor invocation delay and consider whether max_uses=1 (reserving the advisor for the single most critical decision) is a better trade-off than max_uses=5.
4. The executor's model choice matters.
A fast model with a frontier advisor is most cost-efficient for research and browsing tasks. A mid-tier model with a frontier advisor is most accurate for complex software engineering tasks. The right executor depends on task type. Don't default to mid-tier + advisor for everything — measure fast model + advisor on your task category first.
5. Track advisor invocation count per task in production.
If a task type rarely invokes the advisor (< 10% of tasks), the max_uses cap is effectively unused — consider whether you need the advisor for that task type at all. If a task type frequently hits the cap, the executor is over-reliant on escalation and you should examine whether the executor model, system prompt, or task framing needs improvement.
Best Practices
- Do use the advisor strategy when tasks have clear decision forks — ambiguous choices, high-stakes actions, or domains where the executor has low confidence.
- Do set
max_usesdeliberately based on task complexity measurements, not arbitrarily. - Do monitor executor vs. advisor token spend per task to validate that the cost profile matches expectations.
- Don't use the advisor strategy for tasks that are uniformly complex — if every step requires frontier-level reasoning, run the frontier model end-to-end.
- Don't over-escalate: if the executor invokes the advisor for routine decisions, tighten the escalation criteria in the system prompt.
- Don't treat the advisor as a fallback for a poorly-tuned executor. The advisor amplifies a capable executor. It doesn't compensate for fundamental executor weaknesses.
Related
- Multi-Agent Coordination: The broader family of patterns for orchestrating multiple agents. The advisor strategy is a lightweight alternative for tasks that don't need full coordinator/worker decomposition.
- Tool System: The advisor is declared as a tool via
advisor_20260301. Understanding the tool declaration and dispatch model clarifies how the executor-advisor handoff works mechanically. - Safety and Permissions: The advisor pattern is particularly useful before high-stakes or destructive tool calls, where a second opinion from a frontier model is valuable before executing irreversible actions.
- Pattern Index: All patterns from this page in one searchable list, with context tags and links back to the originating section.
- Glossary: Definitions for domain terms used on this page.