Streaming and Events | ClaudePedia

A user watching an agent work doesn't want to stare at a blank screen for 30 seconds. They want to see text appearing, tools being called, progress being made. Streaming turns a black-box wait into a transparent process. But naive streaming (just printing tokens as they arrive) misses the architectural opportunity. A well-designed event system makes agent output composable, observable, and safe under load. A poorly-designed one buries you in tight coupling between the agent loop and every consumer that wants to watch it.

The right framing is not "how do we print faster?" It is "what is the contract between the agent and its observers?" Get that contract right, and streaming a UI, logging a supervisor, and wiring a network serializer all become the same operation: subscribe and handle the types you care about.

There are two distinct event systems in a complete agent implementation. The agent event stream carries LLM and tool lifecycle events: TextDelta, ToolDispatch, Complete. The terminal input event system carries user interaction events: keystrokes, resize, scroll. They are separate buses. Confusing one for the other is a common architectural mistake that produces subtle, hard-to-trace bugs.

The Event Model

The event type system is the contract between the agent loop and its observers. Here is the full model, including type definition, producer, and consumer:

# Event type system: the universal interface
type AgentEvent =
  | RequestStart   { turn_id: string }
  | TextDelta      { turn_id: string, text: string }
  | ToolDispatch   { turn_id: string, tool: string, args: object }
  | ToolResult     { turn_id: string, tool: string, result: object }
  | Complete       { turn_id: string, final_text: string }
  | ErrorEvent     { turn_id: string, error: string }

# Producer: agent loop yields events as they happen
async function* agent_loop(messages, tools) -> AgentEvent:
  response = await llm.call(messages, tools)
  yield RequestStart(turn_id=response.id)

  for delta in response.stream():
    yield TextDelta(turn_id=response.id, text=delta)

  if response.tool_calls:
    for call in response.tool_calls:
      yield ToolDispatch(turn_id=response.id, tool=call.name, args=call.args)
      result = await tools[call.name].run(call.args)
      yield ToolResult(turn_id=response.id, tool=call.name, result=result)
  else:
    yield Complete(turn_id=response.id, final_text=response.text)

# Consumer: handles the event types it cares about, ignores the rest
async function render_to_ui(event_stream):
  async for event in event_stream:
    match event:
      TextDelta:    ui.append_text(event.text)
      ToolDispatch: ui.show_spinner(event.tool)
      ToolResult:   ui.show_result(event.tool, event.result)
      Complete:     ui.finalize()

The event model makes streaming composable. A supervisor can observe an agent's event stream without the agent knowing it's being watched. A logging system can subscribe to the same stream as the UI. Any new consumer type just subscribes and handles the events it cares about, without touching the producer. Add a new consumer: zero changes to the agent loop. Remove a consumer: same. The producer and consumer are fully decoupled through the typed event contract.

Three complementary angles build on this foundation:

Progressive disclosure. Users see text as it's generated, tool calls as they're dispatched, partial results before completion. The pattern is: yield partial state when it's useful to the consumer, not just at completion. An agent that shows its work is an agent users can trust.

Producer-consumer pipeline. The agent loop is the producer, yielding events as they happen. Consumers process them at their own pace. The pipeline can have multiple stages: agent loop to event filter to serializer to network socket to client renderer. The key systems concern is backpressure: what happens when a consumer is slower than the producer.

Backpressure and Buffering

The producer-consumer pipeline has a fundamental tension: the producer can generate events faster than the consumer can process them. This is backpressure, the pressure from a slow consumer pushing back against a fast producer. How you handle it is a design decision with three options on a spectrum:

No buffer (blocking producer). The generator suspends at each yield until the consumer calls next. Maximum safety (the producer never runs ahead of the consumer) but minimum throughput if the consumer is slow. This is the natural behavior of async generators. It's the right default when consumer slowness is acceptable (batch processing, logging to disk).

Bounded buffer. The producer runs ahead up to N events, then blocks. Balances throughput and memory. The buffer absorbs consumer jitter: a consumer that processes events in bursts rather than steadily. The buffer size is the explicit trade-off. Larger means more throughput smoothing, more memory use, and more events potentially lost if the consumer crashes.

Unbounded buffer. The producer never blocks. All events are queued immediately. Maximum throughput, but unbounded memory use. Safe only when the consumer is reliably faster than the producer over time (local in-memory consumers, fast file writes). Risky for network consumers or anything that can fall behind indefinitely.

For UI streaming, the standard choice is a bounded buffer with jitter tolerance: a buffer of 10 to 50 events handles the bursty render patterns of a UI without risking memory exhaustion if the user's connection is slow. For supervisor agents observing a worker, blocking is usually fine because the supervisor processes every event anyway, and falling behind would mean missing critical events.

# Bounded buffer wrapper for a slow consumer
async function consume_with_buffer(event_stream, buffer_size):
  buffer = AsyncQueue(maxsize=buffer_size)

  async function fill_buffer():
    async for event in event_stream:
      await buffer.put(event)    # blocks if buffer is full
    await buffer.put(DONE)

  spawn_background(fill_buffer)

  while True:
    event = await buffer.get()
    if event is DONE:
      return
    yield event

Event Priority and Scheduling

Not all events are equal. In a terminal UI, a keystroke must feel instant. It maps to a discrete user action where any perceptible delay feels broken. A window resize can tolerate a frame of delay because the user doesn't feel a 16ms lag in layout reflow. If the system treats all events with equal urgency, resize events compete with keystrokes for the same dispatch slot, and the result is visible input lag under load.

The solution is a priority model with three classes:

Discrete events (keydown, keyup, click, focus, blur, paste): dispatched synchronously at the highest priority. These cannot be batched. The user expects an immediate response: a character appears, a button activates, focus shifts. Any delay above ~50ms is perceptible.

Continuous events (resize, scroll, mousemove): batched at lower priority. These fire at high frequency during user interaction (dozens per second on a resize drag), and the final state is what matters, not each intermediate value. Batching them absorbs the burst and reduces rendering load.

Default events: everything else. Normal scheduling, no special batching or urgency.

type EventPriority = "discrete" | "continuous" | "default"

function get_event_priority(event_type: str) -> EventPriority:
  match event_type:
    "keydown" | "keyup" | "click" | "focus" | "blur" | "paste":
      return "discrete"    # sync, cannot be batched: user expects instant response
    "resize" | "scroll" | "mousemove":
      return "continuous"  # batched: high frequency, tolerate slight delay
    _:
      return "default"     # normal scheduling

function schedule_event(event: TerminalEvent) -> void:
  priority = get_event_priority(event.type)
  match priority:
    "discrete":   dispatch_sync(event)        # runs immediately, no queuing
    "continuous": enqueue_for_batch(event)    # coalesced with other continuous events
    "default":    enqueue_normal(event)       # standard scheduler queue

This priority mapping is what makes keystrokes feel instant while resize events are batched. It is not a performance optimization you can defer to later. Without it, a burst of resize events during active typing creates visible input lag that users notice immediately. The priority model is a correctness requirement for interactive terminal applications, not a nicety.

Note: This is the terminal input event system (keyboard, mouse, resize). It is completely separate from the agent event stream (TextDelta, ToolDispatch, Complete). The two systems serve different purposes and must not be conflated.

Capture and Bubble Phases

The terminal event system implements the same two-phase dispatch model that web developers know from the browser DOM. Understanding it is important for any agent UI that uses component trees: modal dialogs, overlapping panels, nested input widgets.

When an event is dispatched to a target node, it travels in two phases:

Capture phase: the event walks down from the root of the component tree toward the target node. Each ancestor has the opportunity to intercept the event before it reaches the target.
Bubble phase: after the target handles the event, it walks back up toward the root. Each ancestor has the opportunity to react after the target has processed it.

This two-phase model enables event delegation: a parent component can intercept events intended for its children. A modal dialog can capture all keyboard events during capture phase and prevent them from reaching underlying components. A keyboard shortcut handler at the root can intercept ctrl+c before any child sees it.

# Dispatch an event to a target through the component tree
function dispatch(target: Node, event: Event) -> void:
  # Build ordered listener list: capture listeners root to target, then bubble listeners target to root
  listeners = collect_listeners_capture_to_bubble(root, target, event.type)

  for listener in listeners:
    if event.is_propagation_stopped:
      break
    listener.handle(event)

# A listener claiming an event stops ALL remaining listeners, not just bubbling
function handle_keyboard_shortcut(event: KeyEvent) -> void:
  if event.key == "ctrl+c":
    abort_current_operation()
    event.stop_immediate_propagation()   # no other handler sees this event

stopImmediatePropagation() is stronger than stopPropagation(). Standard DOM has both: stopPropagation() prevents further bubbling but still calls remaining listeners at the current node. stopImmediatePropagation() halts ALL listeners for this event, including others registered at the same node. The terminal event system exposes only stopImmediatePropagation(). This is a stronger guarantee: once a handler claims an event, no other handler sees it at all. Design handlers with this in mind. Claiming an event is an exclusive action.

The web analogy is intentional and makes the pattern portable. A developer who has used addEventListener with the capture flag understands this model immediately. The same mental model applies in terminal UI component trees.

The Output Rendering Model

There is a common misconception about how terminal output works in agent UIs: that the agent streams text directly to the terminal, that each TextDelta event causes a write() call and characters appear immediately. This is not how production terminal UIs work.

The actual model is closer to React's virtual DOM than to a streaming write. Here is the sequence:

TextDelta events arrive from the agent loop and update an in-memory document model.
When a render tick occurs (frame rate controlled, not token arrival rate), the document renders to a screen buffer: a two-dimensional array of character cells, each with a precomputed style, width, and color.
The screen buffer is diffed against the previous frame. Only cells that changed are emitted as terminal escape codes. Cells that haven't changed produce no output.
The diff result is written to the terminal as a compact sequence of escape codes: cursor moves, color changes, character writes, and nothing more.

# Screen buffer and diff model
type ScreenCell = {
  character: str
  width: int         # precomputed: 1 for ASCII, 2 for wide chars (CJK, emoji)
  style_id: int      # index into style table, avoids repeating full style spec
  hyperlink: str?    # optional hyperlink URL
}

type ScreenBuffer = list[list[ScreenCell]]   # rows x columns

function render_frame(document: Document, previous: ScreenBuffer) -> str:
  current = render_to_buffer(document)       # full layout pass
  diff = compute_diff(previous, current)     # only changed cells
  return emit_escape_codes(diff)             # cursor moves + character writes

The consequences of this model are non-obvious:

Output latency is frame-rate-limited, not token-rate-limited. Even if the LLM produces 80 tokens per second, the terminal renders at N frames per second (often 30-60). Characters that arrive between frames are batched into one render. Users see smooth, controlled output, not one character per token write. The frame rate is the throttle, not the LLM speed.

Character widths must be precomputed per cell. 'hello'.length === 5 is correct, but '$B$3$K$K$A(B'.length === 5 while the visual width is 10 columns. CJK characters and emoji occupy 2 columns. Layout algorithms that use string character count for column positions will corrupt the display for any non-ASCII content. The correct approach is grapheme segmentation: count grapheme clusters, not Unicode code points, and look up display width per cluster. Width is cached in the cell (ScreenCell.width) so it's computed once per unique character, not per render frame.

Screen diffing prevents partial-render corruption during resize. If the terminal is resized mid-render and text was being written directly, the lines written before the resize use the old column width and the lines after use the new width. The display is torn. With screen buffering, the resize triggers a full re-render of the buffer at the new dimensions, and the diff produces the complete correct frame. No tearing, no partial-render artifacts.

The Generator Connection

The agent loop's generator pattern (yield the response, check for tool calls, loop) is the source of events. Agent Loop Architecture covers why the loop uses an async generator for streaming intermediate turns: generators compose naturally, suspend at each yield, and let callers observe without coupling. This page covers the other half: what consumers do with those events, and how the pipeline handles load when consumers can't keep up. The two pages are complementary. The agent loop is the producer side, and this page is the consumer side and the pipeline between them.

Production Considerations

Screen diffing means terminal output is frame-rate-limited, not token-rate-limited. An LLM producing 80 tokens per second does not mean 80 writes per second to the terminal. The render loop fires at a controlled frame rate. Tokens arriving between frames are batched into one render. Users experience smooth, uniform output regardless of the LLM's per-token generation speed. This is a feature, not a limitation. Direct per-token writes at 80/second would produce visible flicker.

Disabling the listener limit is load-bearing, not a hack. Most event emitter implementations warn when more than 10 listeners attach to a single event. This heuristic is designed to catch memory leaks. In agent UIs with component trees, many independent components legitimately subscribe to the same keyboard event source. The default limit triggers false warnings that pollute the terminal. The correct response is to remove the limit, but this removes the safety net for real memory leaks. To compensate, component cleanup (unlisten on unmount, teardown on component removal) must be rigorous. The listener limit removal is correct for this use case, but it trades one safety mechanism for a discipline requirement.

Precomputing character widths is not optional for international users. Layout that uses string.length for column calculations will produce corrupted displays for any content containing CJK characters, emoji, or other wide Unicode. The correct approach (grapheme segmentation with per-cluster width lookup) must be applied at the cell level, not at the string level. Do it once per unique character (cache the result in the cell), not per render frame. This is a correctness requirement, not a performance optimization: without it, the display is wrong for a significant fraction of users.

Event priority is a correctness requirement, not a performance optimization. Without priority dispatch, a burst of resize events (dozens per second during a window drag) competes directly with keystrokes for dispatch time. Keystrokes that arrive during a resize burst experience visible input lag. The priority model (synchronous dispatch for discrete input events, batched dispatch for continuous events) is the mechanism that keeps the UI responsive under realistic load. This is not something you can add later once you notice the lag. The architectural separation needs to be there from the start.

stopImmediatePropagation() is an exclusive claim. Design for it. When a handler calls stopImmediatePropagation(), no other handler (including handlers at the same component) sees that event. This is the only propagation control the system exposes (there is no stopPropagation() for partial halt). Handlers that claim events must be designed with this exclusivity in mind: if two handlers at different components both want to handle the same key, the one that captures it first wins completely. Priority and registration order determine who gets the event. Document this in your component contract.

The two event systems must be kept architecturally separate. The terminal input event system (keyboard, resize, scroll) and the agent event stream (TextDelta, ToolDispatch, Complete) are separate buses with different semantics. The terminal input system is synchronous, priority-dispatched, capture/bubble-routed. The agent event stream is asynchronous, yielded from a generator, consumed via async iteration. Mixing them (subscribing to keyboard events expecting TextDelta, or routing ToolDispatch through the terminal dispatcher) produces subtle bugs where events reach the wrong handlers or are dispatched at the wrong priority. Name them explicitly in your codebase.

Best Practices

Do: use discriminated unions for all event types. Don't: use string event names or untyped callbacks. Discriminated unions give you exhaustive pattern matching. A consumer that handles TextDelta and Complete but not ToolDispatch is a compile-time warning, not a silent miss at runtime.

Do: use bounded buffers for UI streaming. Don't: use unbounded buffers for any consumer that can fall behind (network consumers, slow renderers). A bounded buffer is an explicit commitment about how far the producer can run ahead.

Do: precompute character widths via grapheme segmentation. Don't: use string.length for layout calculations. Width is a display property of the rendered character, not a count of Unicode code points.

Do: implement frame-based rendering with screen diffing. Don't: stream individual characters to the terminal on each token arrival. Screen diffing prevents partial-render corruption, handles resize correctly, and gives you control over output latency.

Do: separate the terminal input event system from the agent event stream. Don't: conflate the two systems. They have different dispatch models, different scheduling semantics, and different consumers. Keep them named, typed, and routed separately.

Do: use event priorities (discrete for input, continuous for resize/scroll). Don't: treat all terminal events with equal urgency. The priority model is the mechanism that keeps input responsive under load.

Do: register response handlers before sending messages that expect a response. Don't: send then register. That's a race condition where the response arrives before the handler is ready.

Agent Loop Architecture: The generator pattern that produces agent events. The agent loop is the producer side of the streaming pipeline. It yields events as the LLM responds and tools execute. This page and agent-loop.md cover complementary halves: why the generator exists versus what consumers do with what it yields.
Multi-Agent Coordination: Event propagation through the coordination layer. When a supervisor observes worker agents, it subscribes to their event streams the same way a UI subscribes to the agent loop. Understanding the event model here makes multi-agent observability composable rather than bespoke per-system.
Safety and Permissions: Event isolation as a security boundary. The terminal input event system handles raw user keystrokes, including permission grant/deny responses. Understanding how capture/bubble dispatch routes events through component trees is relevant to how permission dialogs intercept input without affecting underlying components.
Tool System Design: Tools generate events as they execute. Tool dispatch events, tool result events, and abort events flow through the streaming pipeline. The concurrent dispatch of read-only tools creates interleaved event streams that the consumer must handle correctly.
Observability and Debugging: The streaming event pipeline is the primary data source for observability. Event logging, cost tracking, and session tracing all consume the same typed event stream. Understanding the event model here makes the observability layer's span hierarchy and cost attribution directly interpretable.
Pattern Index: All patterns from this page in one searchable list, with context tags and links back to the originating section.
Glossary: Definitions for all domain terms used on this page, from agent loop primitives to memory system concepts.