Command and Plugin Systems

An agent starts with a handful of commands. In production, it grows to 100+. If each command wires itself into the dispatch logic at registration time, you get a maintenance nightmare. Adding a command means touching the dispatcher, the help system, the permission layer, and the availability checks. Every command knows too much about the system it lives in.

The solution is to separate declaration from implementation. Every command is a metadata object first. The metadata tells the system what the command is, when it's available, and how to load it, but never what it does. Implementation is deferred to invocation time via dynamic imports. This is the metadata-first registration pattern, and it enables everything else: lazy loading, multi-source merging, and command discovery without code execution.

The big idea: the registry can hold 100+ commands while loading essentially none of them at startup. A command that is never invoked never loads its module. A command that is disabled never even reaches the dispatcher. And because every source (builtins, plugins, skills, external providers) contributes the same metadata shape, the registry merges them all with a simple array concatenation. No collision resolution logic. No priority scores. The ordering of the arrays IS the priority.

The Three Command Types

The command type is a discriminated union on the type field. The dispatcher needs to know how to run each type without knowing what any specific command does:

type Command = BaseMetadata & (
  | { type: "local", load: () -> Promise<{ call: LocalCommandFn }> }
  | { type: "interactive", load: () -> Promise<{ call: InteractiveCommandFn }> }
  | { type: "prompt", get_prompt: (args, context) -> Promise<ContentBlock[]> }
)

The three types cover all behaviors:

Type	What It Does	Example
`local`	Runs a function, returns a result (text, structured data, or skip). No UI.	`/summarize`, `/compact`
`interactive`	Renders a UI component, receives a completion callback. For commands that need user interaction.	`/settings`, `/agents`
`prompt`	Expands into the conversation as content. The command produces messages, not side effects.	Skills, workflow commands, plugin-provided commands

Why three types instead of one? Because the dispatcher needs to route correctly. A local command is called and its return value is processed. An interactive command is rendered and the system waits for the on_done callback. A prompt command injects content blocks into the conversation and the agent loop continues. Each dispatch path is different, but none of the dispatch logic touches any specific command's implementation. The type field is the only coupling.

Metadata-First Registration

Every command is described by a base metadata object before it is ever loaded. The following shows a complete registration. Notice that the module containing the implementation is not imported here:

# Each command is a metadata object: behavior is declared, not wired
command = {
  type: "local",            # one of: local, interactive, prompt
  name: "summarize",
  description: "Summarize the current conversation",
  is_enabled: () -> config.get("summarize_enabled", default=True),
  argument_hint: "<optional focus area>",
  load: () -> import_module("commands.summarize")  # lazy, not called at registration
}

# Registry just holds the metadata objects
registry.add(command)

Each metadata field has a specific purpose in the dispatch lifecycle:

is_enabled(): evaluated fresh every time the command list is requested. NOT memoized. Why: auth state can change mid-session (for example, after a login command grants new permissions). Memoizing this check means users don't see commands they've just unlocked, a silent failure with no error message.
availability: a separate, static gate: who can ever use this command (auth requirements, provider restrictions). Evaluated independently from is_enabled to keep "who can use it" apart from "is it turned on right now." Conflating them into one check breaks post-login auth refreshes.
load(): the only connection to the implementation. Returns a module with a call() function. This is the lazy loading hook. The module is never imported at registration time.
argument_hint: what the help system shows. The help system works entirely from metadata. It never loads the command implementation to describe the command.
description: what users see in the command picker. The LLM uses this field for command suggestions. Neither the user interface nor the LLM ever sees the implementation code.

The key insight: "The metadata describes the command completely enough for the registry, help system, permission checks, and availability filtering to work, without ever loading the command's code."

Lazy Loading and Startup Cost

Every command implementation is loaded only when the command is invoked. The pattern is a two-phase call:

# At registry time: just metadata, no code loaded
load: () -> import_module("commands.summarize")

# At invocation time: module loaded, function called
module = await command.load()
return module.call(args, context)

This matters enormously at scale. With 100+ commands, eager loading would import every command module at startup. Some commands pull in heavy dependencies. Consider a single command whose module is 113KB and 3,200 lines because it includes diff rendering and HTML generation. Loading that at startup costs 113KB of parse time, even for users who never invoke that command. With lazy loading, startup cost is O(1) regardless of how many commands exist: a 100-command registry loads 100 metadata objects, not 100 modules.

The heavy-command shim pattern goes one step further. For especially heavy commands, even the import wrapper can be deferred. The registry entry is a thin metadata object that constructs the dynamic import call only when invoked. The import path itself is not evaluated until dispatch:

# Thin shim: the import path is not evaluated until load() is called
heavy_command = {
  type: "local",
  name: "analyze",
  description: "Analyze the codebase",
  is_enabled: () -> feature_enabled("INSIGHTS"),
  load: () -> import_module("commands.analyze")  # path string only, not executed
}

This avoids even the overhead of evaluating the import path at registry build time, which is useful when registry construction is on the critical path to first response.

Multi-Source Registry Merge

The registry is not a static list. In a production agent, commands come from multiple sources: built-in commands shipped with the system, skills discovered from project directories, plugin commands installed by users, and workflow commands from project configuration files. All four sources must coexist with identical interfaces and identical dispatch paths.

The multi-source merge loads all sources concurrently and concatenates them:

async function get_commands(working_dir: str) -> list[Command]:
  # All sources load concurrently
  [builtin_skills, plugin_skills, plugin_commands, workflow_commands] = await parallel([
    load_skills(working_dir),
    load_plugin_skills(),
    load_plugin_commands(),
    load_workflow_commands(working_dir),
  ])

  all_commands = [
    ...builtin_skills,
    ...plugin_skills,
    ...plugin_commands,
    ...workflow_commands,
    ...BUILTIN_COMMANDS,   # static list, always present
  ]

  # Run availability + enabled checks fresh: never memoize these
  return [c for c in all_commands if meets_availability(c) and c.is_enabled()]

The five sources and their ordering:

Built-in skills (shipped with the system): always present
Plugin skills (user-installed skill directories): project-local
Plugin commands (code-backed, from user-installed plugins): user-global
Workflow commands (file-backed, from project config): project-local
Built-in commands (the static core list): always present

Ordering IS the priority. A plugin skill with the same name as a built-in command shadows it because it appears earlier in the array. There is no collision resolution logic, no priority score calculation. The array merge is intentionally simple. Simplicity here prevents a class of bugs where two sources both claim a command name and the winner is non-obvious.

Memoization by working directory. The expensive parts (skill discovery, plugin loading) are memoized by working_dir because skills are project-local. Two projects can't share a command cache. A command discovered in /project-a/.claude/skills/ is not available in project B. But the availability and is_enabled checks run fresh on every get_commands() call because those checks are cheap and auth state can change mid-session.

Feature Flags and Dead-Code Elimination

Some commands only exist when a feature flag is active. The temptation is to use dynamic imports for these, but dynamic imports cannot be tree-shaken by bundlers:

# Wrong: dynamic import, bundler cannot tree-shake this
# The module is included in every build regardless of flag state
if feature_enabled("EXPERIMENTAL_INSIGHTS"):
  module = await import_module("commands/insights")
  commands.append(module.default)

# Correct: conditional require, bundler can tree-shake when flag is off
# When the flag is off at build time, the module is excluded entirely
if feature_enabled("EXPERIMENTAL_INSIGHTS"):
  commands.append(require("commands/insights"))

# The command list uses filter to handle null entries
commands = [c for c in raw_commands if c is not None]

The reason this matters: a dynamic import path is a string that the bundler cannot evaluate at build time. The bundler must include the module in every build because the import might execute. A static require() call (or a static import at the top of the file) is analyzable. If the condition is false at build time, the bundler eliminates the module entirely.

This is a subtle but important production decision. Using dynamic imports for feature-flagged commands means every production build carries the dead code of every disabled experimental feature.

Production Considerations

1. Never memoize availability checks.

The is_enabled() function runs fresh on every command list request because auth state changes mid-session. Consider a login command: after it succeeds, the system must show new commands that are gated on authenticated status. If is_enabled() is memoized, users don't see those commands until the next process restart. This is counterintuitive. The natural instinct is to cache for performance. The right split: memoize the expensive discovery (skill scanning, plugin loading) and keep the cheap checks (availability, enabled state) live.

2. Separate "who can use it" from "is it turned on."

Availability (a static auth/provider gate) and enabled state (a dynamic feature flag or environment check) are different concerns. A command can be available to a user but currently disabled (feature flag off). A command can be enabled but not available (requires auth the user hasn't completed). Conflating these into a single is_available() check makes it impossible for the system to distinguish these cases, and post-login auth refreshes will fail to surface newly-unlocked commands.

3. Memoize by working directory, not globally.

Skills are project-local. Different projects have different skill directories. A global command cache serves stale skills when switching projects. The memoization key must include the working directory. If your agent supports multiple concurrent projects, each project needs its own command cache entry.

4. Heavy commands need lazy shims, not just lazy loading.

For commands with very large implementations (tens of thousands of lines, multiple heavy dependencies), even the dynamic import wrapper should be deferred. Define a thin metadata object that constructs the import call only at invocation time. This avoids evaluating import paths at registry build time, which is relevant when registry construction is on a hot path.

5. Static imports for feature-flagged commands.

Using dynamic imports for feature-gated commands defeats dead-code elimination. The bundler can only tree-shake what it can statically analyze. Feature flags that are known at build time should use conditional require() or top-level static imports, not dynamic import() inside conditionals.

Best Practices

Do separate command metadata from command implementation. The registry should never import command code at registration time.
Do run availability checks fresh on every request. Auth state changes mid-session.
Do use discriminated unions for command types. The dispatcher needs to know how to run each type without knowing what it does.
Do memoize expensive discovery (skill scanning, plugin loading) by working directory.
Don't use a single boolean for "is this command available." Separate static availability from dynamic enabled state.
Don't memoize the full command list including enabled checks. Only memoize the expensive parts.
Don't use dynamic imports for feature-flagged commands. Use static imports that bundlers can tree-shake.
Don't rely on ordering for correctness beyond precedence. If two commands with the same name should NOT shadow each other, that's a naming problem, not a registry problem.

Tool System: Commands and tools are complementary: tools are what the LLM invokes autonomously, commands are what the user invokes explicitly. Both use registry patterns and metadata-first design.
MCP Integration: MCP servers can contribute commands to the registry through the prompt command type, extending the command system across process boundaries.
Hooks and Extensions: Another extension mechanism that modifies agent behavior at defined extension points without touching core code. Hooks are event-driven while commands are user-invoked.
Pattern Index: All patterns from this page in one searchable list, with context tags and links back to the originating section.
Glossary: Definitions for domain terms used on this page.