Safety and Permissions

Agents act autonomously, and that means they can cause damage autonomously. A file-writing agent that overwrites the wrong directory, an API-calling agent that sends unauthorized requests, a search agent that leaks data it shouldn't have accessed: these failures aren't hypothetical. They happen when the permission model is too simple.

The naive model is a single yes/no check before each action: does this agent have permission? But "this agent" isn't a single thing. It might be operating on behalf of a user, inside a specific project, under a policy set by an organization, with permissions explicitly granted during the current session. The permission to act comes from multiple sources at once, and they don't always agree. The model that makes this tractable is a cascade: a prioritized sequence of six policy sources, where the first matching rule wins and no match means deny. Every decision is logged with its source and reason type, so the cascade is auditable, not just mechanical.

Beyond the cascade sits a second layer: permission modes that provide a global semantic override (planning mode auto-denies all writes, bypass mode auto-approves everything, silent deny mode refuses all unlisted actions). And a third layer: resolution handlers that route the decision differently depending on whether you're in an interactive session, a coordinator, or a headless worker. These three layers (cascade, mode, handler) work together to make agent permission behavior both predictable and auditable.

The Permission Cascade

The cascade evaluates six ordered sources. The first source that has an opinion on an action wins. If nothing matches, the default is DENY (fail-closed).

The six sources in priority order:

policySettings: enterprise-managed rules pushed to all users. Highest priority. Cannot be overridden by any lower source.
projectSettings: committed to version control and shared with the whole team. Narrower than policy but broader than personal settings.
localSettings: gitignored per-project personal overrides. Set by the developer for their own use on this project only.
userSettings: global personal settings (e.g., ~/.agent config). Apply across all projects for this user.
cliArg: --allow / --deny flags passed at launch time. Convenient for one-off sessions without modifying config files.
session: permissions granted interactively during this conversation. Lowest priority. Expire when the session ends.

The six-source cascade evaluation in pseudocode:

# Every matched rule carries source, behavior, and decision reason for audit
RULE_SOURCES = [
  "policySettings",    # enterprise-managed, pushed to all users
  "projectSettings",   # committed to git, shared with team
  "localSettings",     # gitignored per-project, personal
  "userSettings",      # global ~/.agent settings, personal
  "cliArg",            # --allow / --deny at launch time
  "session",           # granted during this conversation, expires on exit
]

function evaluate_permission(action, context) -> Decision:
  # Bypass-immune checks first: cannot be overridden by any source
  if not passes_scope_bounds(action, context.scope):
    return DENY(source="scope_check", reason="scope_bounds_violation")

  # Build rule lists from all sources
  all_deny_rules  = flatten(context[src].deny_rules  for src in RULE_SOURCES)
  all_ask_rules   = flatten(context[src].ask_rules   for src in RULE_SOURCES)
  all_allow_rules = flatten(context[src].allow_rules for src in RULE_SOURCES)

  # Deny checked first: most restrictive wins within source priority
  for rule in all_deny_rules:
    if rule.matches(action):
      log(source=rule.source, behavior="deny", reason="deny_rule")
      return DENY(f"deny rule from {rule.source}")

  # Ask next: user will be prompted
  for rule in all_ask_rules:
    if rule.matches(action):
      log(source=rule.source, behavior="ask", reason="ask_rule")
      return ASK(reason=rule)

  # Allow last
  for rule in all_allow_rules:
    if rule.matches(action):
      log(source=rule.source, behavior="allow", reason="allow_rule")
      return ALLOW

  # No match: apply mode-based default (fail-closed if mode == "default")
  return mode_default_decision(context.mode, action)

Three design choices embedded here are worth naming explicitly:

First-match wins. Each source returns ALLOW, DENY, or ABSTAIN (no opinion). The cascade stops at the first non-ABSTAIN decision. Given any action and context, you can trace exactly which source made the decision. There's no ambiguity and no averaging.

Deny beats allow within a source. When building the rule lists, deny rules are checked before ask rules, which are checked before allow rules. Within any single source, the most restrictive opinion wins.

Bypass-immune checks are not policy. Scope bounds checking (is the agent trying to act outside its designated directory?) runs before the cascade and cannot be overridden by any source. If scope checks were inside the cascade, a sufficiently privileged policy source could override them, making "scope" meaningless. Bypass-immune checks run unconditionally.

Every decision is logged. The audit trail carries the matched source, the behavior (allow/deny/ask), and the reason type (rule/hook/classifier/mode/safety_check). This is what makes the cascade inspectable when debugging unexpected denials.

Permission Modes

Permission modes sit on top of the cascade and provide a global semantic override. When a mode's rule fires, it bypasses normal cascade evaluation for that action type. Five modes are externally addressable:

# The five externally-addressable permission modes
type PermissionMode =
  | "default"           # ask for all unlisted tools (standard interactive behavior)
  | "plan"              # read-only: all tools with write effects auto-denied
  | "acceptEdits"       # file edit tools auto-approved, everything else still asks
  | "bypassPermissions" # all tools auto-approved without prompting (requires explicit availability)
  | "dontAsk"           # all unlisted tools auto-denied silently, no prompt

# Mode to default decision when no cascade rule matches
function get_mode_default(mode: PermissionMode, tool: Tool) -> Decision:
  match mode:
    "plan":
      if tool.has_write_effect: return DENY("plan mode: write tools blocked")
      else: return ASK       # reads still ask: plan mode is not auto-approve for reads
    "acceptEdits":
      if tool.is_file_edit: return ALLOW
      else: return ASK
    "bypassPermissions":
      return ALLOW           # all tools, no questions
    "dontAsk":
      return DENY            # all unlisted tools, silently
    "default":
      return ASK             # standard prompt

Mode cycling is deterministic: default to acceptEdits to plan to bypassPermissions to default. The cycling order doesn't represent a severity gradient. It's a UI cycle. bypassPermissions requires explicit availability (guarded by a feature gate checked at startup), so agents in restricted environments skip it in the cycle.

dontAsk vs bypassPermissions are both "no prompt" modes with opposite defaults. dontAsk silently denies all unlisted tools. The agent looks stuck but won't bother the user. bypassPermissions silently approves everything, providing maximum automation. Confusing them has real consequences.

Three-Handler Architecture

Permission resolution follows different paths depending on execution context. There are three handler types, and the handler is determined by context, not by rule configuration.

# Handler selection based on execution context
function resolve_permission(tool, input, context) -> Decision:
  if context.is_swarm_worker:
    return swarm_worker_handler(tool, input, context)
  if context.is_coordinator_mode:
    return coordinator_handler(tool, input, context)
  return interactive_handler(tool, input, context)

# Interactive handler: races multiple resolution paths concurrently
function interactive_handler(tool, input, context) -> Decision:
  resolution = create_resolution_promise()

  # All four resolution paths run concurrently: first to resolve wins
  spawn: permission_hooks(tool, input, context)     # fast, local rule evaluation
  spawn: classifier(tool, input, context)           # LLM-based classification (slower)
  spawn: bridge_response(tool, input, context)      # remote UI (e.g. web interface)
  spawn: channel_relay(tool, input, context)        # external channels (e.g. Telegram)
  show_dialog(tool, input, context)                 # user interaction: always present as floor

  return await first_to_resolve(resolution)

# Coordinator handler: sequential, no racing
function coordinator_handler(tool, input, context) -> Decision:
  if decision = run_permission_hooks(tool, input, context): return decision
  if decision = run_classifier(tool, input, context): return decision
  return show_dialog_and_wait(tool, input, context)

# Swarm worker handler: forward to leader if classifier can't decide
function swarm_worker_handler(tool, input, context) -> Decision:
  if decision = try_classifier(tool, input, context):
    return decision  # classifier auto-approved or auto-denied

  # Can't show UI in headless worker: forward to leader
  request = create_permission_request(tool, input)
  register_callback(request.id)      # register BEFORE sending (race prevention)
  send_to_leader_mailbox(request)
  show_pending_indicator(tool)

  return await leader_response(request.id, context.abort_signal)

The interactive handler races four resolution paths simultaneously. This means a fast local rule (hooks) can approve an action before the user even sees a dialog. The 200ms grace period: if a keypress arrives within 200ms of the permission dialog appearing, it's treated as a pre-existing keystroke from a previous command, not as user interaction. The classifier is still allowed to auto-approve during that window, preventing accidental keypresses from canceling the classifier check prematurely.

The coordinator handler runs sequentially: hooks, then classifier, then dialog. No racing, because coordinator sessions are designed for predictable sequential tool use.

Why does the interactive handler race its paths? Speed. In a typical interactive session, the user is watching and fast hooks can approve a safe action (like reading a file that's already in the project) before the dialog even renders. The user sees the tool result without ever seeing a prompt. Racing the four paths achieves this without sacrificing the safety floor. The user dialog is always "in the race" and will show if no other path resolves first. The classifier is the expensive path (LLM inference), and the hooks are cheap (local rule matching). Racing them means fast paths pre-empt slow ones without needing to explicitly time-box the classifier.

The swarm worker handler is covered fully in Multi-Agent Permission Forwarding.

Bypass-Immune Checks and Mode Guards

Some safety checks run before the cascade and cannot be overridden by any policy source, any mode, or any rule.

Scope bounds checking is the canonical example. If the agent is constrained to a specific working directory, writing outside that directory is denied unconditionally, not as a policy decision but as an architectural constraint. If scope checking were inside the cascade, bypassPermissions mode could override it. Bypass-immune checks survive mode transitions by design.

bypassPermissions mode is itself guarded. The mode exists for automated pipelines, but it's controlled by a feature gate checked at startup. If the gate is closed (default in most deployments), the mode is unavailable and removed from the cycle. Entering bypass mode without explicit authorization is blocked at the infrastructure level.

Dangerous pattern stripping applies in certain permissive modes. Rules matching patterns like interpreters (python, node, ruby), shells (bash, zsh), code runners (npx, eval, exec, sudo), and similar are stripped before evaluation. This prevents a broad rule like allow python:* from acting as a blanket code execution bypass in modes that auto-approve unlisted tools.

The invariant: bypass-immune checks protect properties that no policy source may waive. If you find yourself thinking "but with bypass mode, we could..." that's exactly the scenario bypass-immune checks exist to prevent.

Graduated Trust

Not all instructions carry the same authority. When an agent receives instructions from multiple sources (the system prompt, the user's message, a tool result, a sub-agent's output) those sources have different trust levels, and the system must enforce that hierarchy.

The standard ordering:

System prompt: highest authority. Set by the developer who deployed the agent. Can establish and expand agent permissions.
User turn: medium authority. The human operator. Can use permissions within what the system prompt allows, but cannot exceed them.
Tool results: lower authority. External data returned by tool calls. Cannot expand permissions and can only inform decisions.
Sub-agent output: lowest authority in a multi-agent system. A worker agent's output should be treated as data, not as instruction.

The critical invariant: an agent cannot grant itself elevated permissions. Trust flows downward. A system prompt can grant broad permission, a user can use that permission, but neither a tool result nor a sub-agent claiming to have elevated access can override the hierarchy above them.

This matters especially in multi-agent systems. A coordinator that receives a message from a worker claiming "the user approved X" must not act on that claim. The worker doesn't have the authority to relay user approvals. Only instructions that arrive through the original user turn or system prompt carry that trust level.

Multi-Agent Permission Forwarding

Workers run in isolated execution contexts and can't show UI. When a worker needs a permission decision that can't be resolved locally (classifier can't auto-approve, no matching rule), it delegates to the leader via a mailbox protocol.

# Worker side: create request, register callback, then send
function request_permission_from_leader(tool, input, context) -> Decision:
  request = PermissionRequest {
    id:         generate_id(),
    tool:       tool.name,
    input:      input,
    worker_id:  context.agent_id,
  }

  # CRITICAL: Register callback before sending to leader
  # If we send first and leader responds before we register, we lose the response
  callback = register_callback(request.id, context.abort_signal)

  # Write to leader's mailbox
  write_to_mailbox(context.leader_mailbox_path, request)
  show_pending_indicator(tool)

  # Wait for leader response (or abort if session ends)
  response = await callback

  if response.type == "cancel":
    return DENY("leader did not respond before session ended")

  # Leader may have modified the tool input (e.g. sanitized a path)
  if response.updated_input:
    input = response.updated_input

  return response.decision

# Leader side: poll mailbox, show UI, send response
function poll_and_respond_to_workers():
  for request in read_mailbox(leader_mailbox_path):
    show_worker_permission_dialog(request)      # blocks until user decides
    response = PermissionResponse {
      id:            request.id,
      decision:      user_decision,
      updated_input: maybe_sanitized_input,     # leader can modify input
    }
    write_to_worker_mailbox(request.worker_id, response)

The race condition guard is the callback registration order. The worker registers its callback before writing to the mailbox, not after. If the sequence were reversed (write first, then register), the leader could respond in the window between the write and the registration, and the response would be permanently lost. The worker waits forever. Always register callbacks before any operation that could trigger the response.

The updated_input field lets the leader modify what the tool actually runs. If a worker requests write /tmp/sensitive-file, the leader can respond with an approved write to a sanitized path. The worker executes the leader's modified input, not its original request.

AbortSignal fires on session end. If the leader's session ends while a worker is waiting, the worker's abort signal fires and it resolves with a cancel decision, preventing a hung worker with no recourse.

Shadow Rule Detection

A shadowed rule is a rule that can never be reached. Shadow rules are a silent correctness problem: the developer believes a specific permission is granted, but it never fires.

Two shadow types:

Deny-shadowing: a broad deny rule blocks a specific allow rule. Example: you add bash(ls:*) to allow specific directory listings, but also have bash in the deny list. The deny list is checked before the allow list, so the specific allow never matches. The tool behaves as if always denied.

Ask-shadowing: a tool-wide ask rule means the user is always prompted before the specific allow can be checked. If bash is in the ask list and bash(ls:*) is in the allow list, the ask rule fires first. The specific allow is never reached because the user is prompted every time regardless.

# Shadow rule detection at write time (not at evaluation time)
function detect_shadowed_rules(new_rules: RuleSet) -> list[Warning]:
  warnings = []

  for allow_rule in new_rules.allow_rules:
    # Check for deny-shadowing: any deny rule that would match this allow's pattern
    for deny_rule in new_rules.deny_rules:
      if deny_rule.pattern.subsumes(allow_rule.pattern):
        warnings.append(
          ShadowWarning(
            shadowed=allow_rule,
            shadower=deny_rule,
            type="deny_shadow",
            message=f"'{deny_rule.pattern}' will always deny before '{allow_rule.pattern}' is checked"
          )
        )

    # Check for ask-shadowing: any ask rule that would always prompt before this allow
    for ask_rule in new_rules.ask_rules:
      if ask_rule.pattern.subsumes(allow_rule.pattern):
        if not sandbox_auto_allow_enabled(new_rules):
          warnings.append(
            ShadowWarning(
              shadowed=allow_rule,
              shadower=ask_rule,
              type="ask_shadow",
              message=f"'{ask_rule.pattern}' will always prompt before '{allow_rule.pattern}' is checked"
            )
          )

  return warnings

Shadow rule detection runs when rules are written, not at evaluation time. This is the right place. Discovering a shadow at evaluation time means the developer has already made a decision based on a false belief about the permission state. Detecting it at write time lets you warn before the belief takes hold.

The sandbox exception: when sandbox auto-allow is enabled, personal ask-rules don't shadow bash allow-rules. The sandbox's auto-approval mechanism bypasses the normal ask rule check, so the allow rule can fire.

Denial Tracking and Auto-Fallback

Classifier-based permission systems have a failure mode: silent infinite rejection loops. A classifier that fails closed (always denies on error, or always denies a particular pattern) can spin forever with no user feedback. The agent looks stuck. No dialog appears. The user doesn't know why.

Denial tracking solves this by counting consecutive and total denials, then escalating to user dialog when thresholds are crossed.

type DenialState = { consecutive: int, total: int }

LIMITS = { max_consecutive: 3, max_total: 20 }

# Check before running classifier: should we skip to dialog?
function should_escalate_to_dialog(state: DenialState) -> bool:
  return (
    state.consecutive >= LIMITS.max_consecutive or
    state.total >= LIMITS.max_total
  )

# Update denial state after a classifier decision
function update_denial_state(decision: Decision, state: DenialState) -> DenialState:
  match decision:
    DENY:  return { consecutive: state.consecutive + 1, total: state.total + 1 }
    ALLOW: return { consecutive: 0, total: state.total }  # reset consecutive on success
    ASK:   return { consecutive: 0, total: state.total }  # ask = escalation, also resets

# Full permission resolution with denial tracking
function resolve_with_tracking(tool, input, context) -> Decision:
  if should_escalate_to_dialog(context.denial_state):
    return show_dialog_and_wait(tool, input, context)  # skip classifier

  decision = run_classifier(tool, input, context)
  context.denial_state = update_denial_state(decision, context.denial_state)

  if decision == ASK or decision == DENY_TO_DIALOG:
    return show_dialog_and_wait(tool, input, context)

  return decision

The thresholds (3 consecutive, 20 total) represent observed frustration thresholds, not theoretical values. After 3 consecutive denials in a row, something is wrong. After 20 total, even with intermittent successes, the classifier is denying too much. Either threshold triggers an escalation to user dialog so the human can take over.

The consecutive counter resets on any successful approval. A single approved action indicates the classifier isn't stuck. The total counter never resets within a session, which is intentional: it catches diffuse denial patterns that the consecutive counter would miss.

Why two counters? The consecutive threshold catches obvious stuck-classifier situations: 3 denials in a row with no approvals in between is the classifier looping on one action. The total threshold catches a more subtle pattern: the classifier that approves some things but denies a high fraction of requests overall. In a long session, 20 total denials might be spread across 5 different tools, each denied 4 times. Consecutive never hits 3, but the aggregate friction is clearly wrong. Both conditions should escalate independently.

What happens after escalation? The user dialog appears and the user decides. If the user approves, the consecutive counter resets (approval reset). If the user denies (actively, via the dialog), that denial does NOT increment the denial tracker. Human denials are expected and correct. The tracker only counts classifier denials. After user interaction, the classifier gets another chance on subsequent requests.

Production Considerations

Six sources, not four: the team/enterprise distinction is real. policySettings (enterprise push) and projectSettings (git-committed) can conflict when both define rules for the same tool. Because policy has higher priority than project, a team cannot override enterprise policy via their committed settings. Developers who find "project settings not working" often don't realize a higher-priority source is shadowing them. Always inspect the full six-source cascade when debugging unexpected denials, not just the sources you configured.

Classifier-based modes need denial tracking, not just thresholds. A classifier that fails closed will spin forever with no user feedback if denial tracking isn't implemented. The 3-consecutive / 20-total thresholds aren't arbitrary. They're calibrated to the difference between "the classifier is working but cautious" and "the classifier is stuck." Implementing a classifier permission system without denial tracking is a recipe for invisible agent hangs.

Bypass-immune checks must survive mode transitions. If scope bounds checking is gated behind "is in bypassPermissions mode?", bypass mode breaks the invariant it's supposed to protect. The correct structure is: bypass-immune checks run before cascade evaluation, unconditionally, regardless of mode. The check itself is immune, not the result of the check.

Shadow rule detection is a security issue, not just UX. A shadowed allow rule that silently fails is a correctness problem with security implications: the developer believes a narrowly-scoped permission was granted, but in practice the broad deny rule is in force. Without detection, the response to "my allow rule isn't working" is to widen the allow rule, which often means granting more access than intended.

The interactive handler's 200ms grace window prevents a specific class of accidental approvals. In interactive sessions, a classifier might auto-approve an action before the dialog appears. If the user's previous keypress arrives within 200ms of the dialog, it could look like "user dismissed the dialog immediately," which would then be treated as user approval of the auto-approval. The grace window prevents this by deferring user-interaction detection briefly. Without it, fast typers trigger false approvals.

The register-before-send ordering in permission forwarding is a race condition, not style. If a worker writes to the leader mailbox before registering its callback, the leader can respond in the window between the write and the registration, and the response is permanently lost. The worker waits forever. Always register callbacks before any operation that could trigger the response.

Best Practices

Do: fail closed on no match. The cascade's default is DENY when nothing matches. Fail-open ("if uncertain, allow") looks convenient but inverts the cost asymmetry. Wrongly allowing an action can be irreversible. Wrongly blocking one costs a prompt.

Don't: gate bypass-immune checks behind mode. Mode controls the cascade. Bypass-immune checks run before the cascade. If you put scope checking inside the mode logic, bypass mode can defeat it. The check and the mode are at different architectural layers on purpose.

Do: implement denial tracking with escalation. Any permission system that uses a classifier needs denial tracking. A classifier that rejects an action will keep rejecting it without escalation. Set thresholds: 3 consecutive denials or 20 total should escalate to user dialog regardless of what the classifier says.

Don't: let workers show UI directly. Workers run in headless execution contexts. Code that calls a user-facing dialog from a worker context will either hang (no UI to render into) or throw. Route worker permission requests to the leader via mailbox.

Do: detect shadowed rules at write time. Check new rules against existing deny and ask rules before persisting them. Warn on any allow rule that would be shadowed by a broader deny or ask. The developer making the write-time decision is in the best position to resolve the conflict.

Don't: confuse dontAsk and bypassPermissions. Both modes skip the user prompt, but they're opposites: dontAsk denies everything silently, while bypassPermissions approves everything silently. The word "bypass" means bypassing the prompt, not bypassing safety. But in the dontAsk case, "bypassing the prompt" means the request is silently refused.

Do: log the full decision path. Every permission decision should record the source that matched (which of the six), the behavior (allow/deny/ask), and the reason type (rule/classifier/mode/hook/safety_check). When something goes wrong in production, "permission denied" is not enough. You need to know which source made the decision and why.

Tool System Design: The fail-closed principle governing the permission cascade originated in tool metadata design: when a tool's safety flags are missing, default to the most restrictive interpretation. The same asymmetric cost argument applies to both systems. Tool system covers how tools declare their own permission requirements. This page covers how those requirements are evaluated.
Multi-Agent Coordination: Covers the full worker spawning model, mailbox communication, and session reconnection. The permission forwarding protocol on this page (workers delegate to leader) is the safety-specific view. Multi-agent coordination covers the broader communication architecture and backend abstraction that makes mailbox forwarding possible.
Streaming and Events: The terminal input event system handles raw user keystrokes including permission grant/deny responses. The capture/bubble dispatch model routes permission dialog input without affecting underlying components.
MCP Integration: MCP tool annotations (destructive_hint, read_only_hint) feed directly into the permission cascade. External tools from MCP servers go through the same permission checks as built-in tools.
Pattern Index: All patterns from this page in one searchable list, with context tags and links back to the originating section.
Glossary: Definitions for all domain terms used on this page, from agent loop primitives to memory system concepts.