Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Sub-Agent Orchestration

Sub-agents let you delegate tasks to specialized helpers that work in the background while you continue chatting with Zeph. Each sub-agent has its own system prompt, tools, and skills — but cannot access anything you haven’t explicitly allowed.

Quick Start

  1. Create a definition file:
---
name: code-reviewer
description: Reviews code for correctness and style
---

You are a code reviewer. Analyze the provided code for bugs, performance issues, and idiomatic style.
  1. Save it to .zeph/agents/code-reviewer.md in your project (or ~/.config/zeph/agents/ for global use).

  2. Spawn the sub-agent:

> /agent spawn code-reviewer Review the authentication module
Sub-agent 'code-reviewer' started (id: a1b2c3d4)

Or use the shorthand @mention syntax:

> @code-reviewer Review the authentication module
Sub-agent 'code-reviewer' started (id: a1b2c3d4)

That’s it. The sub-agent works in the background and reports results when done.

Managing Sub-Agents

CommandDescription
/agent listShow available sub-agent definitions
/agent spawn <name> <prompt>Start a sub-agent with a task
/agent bg <name> <prompt>Alias for spawn
/agent statusShow active sub-agents with state and progress
/agent cancel <id>Cancel a running sub-agent (accepts ID prefix)
/agent resume <id> <prompt>Resume a completed sub-agent with its conversation history
/agent approve <id>Approve a pending secret request
/agent deny <id>Deny a pending secret request
@name <prompt>Shorthand for /agent spawn

Checking Status

> /agent status
Active sub-agents:
  [a1b2c3d4] working  turns=3  elapsed=42s  Analyzing auth flow...

Cancelling

The cancel command accepts a UUID prefix. If the prefix is ambiguous (matches multiple agents), you’ll be asked for a longer prefix:

> /agent cancel a1b2
Cancelled sub-agent a1b2c3d4-...

Resuming

Resume a previously completed sub-agent session with /agent resume. The agent is re-spawned with its full conversation history loaded from the transcript, so it picks up where it left off:

> /agent resume a1b2 Fix the remaining two warnings
Resuming sub-agent a1b2c3d4-... (code-reviewer) with 12 messages

The <id> argument accepts a UUID prefix, just like cancel. The <prompt> is appended as a new user message after the restored history.

Resume requires transcript storage to be enabled (it is by default). If the transcript file for the given ID does not exist, the command returns an error.

Transcript Storage

Every sub-agent session is recorded as a JSONL transcript file in .zeph/subagents/ (configurable). Each line is a JSON object containing a sequence number, ISO 8601 timestamp, and the full message:

.zeph/subagents/
  a1b2c3d4-...-...-....jsonl        # conversation transcript
  a1b2c3d4-...-...-....meta.json    # sidecar metadata

The meta sidecar (<agent_id>.meta.json) stores structured metadata about the session:

{
  "agent_id": "a1b2c3d4-...",
  "agent_name": "code-reviewer",
  "def_name": "code-reviewer",
  "status": "Completed",
  "started_at": "2026-03-05T10:00:00Z",
  "finished_at": "2026-03-05T10:01:38Z",
  "resumed_from": null,
  "turns_used": 5
}

When a session is resumed, the new meta sidecar records the original agent ID in resumed_from, creating a traceable chain.

Old transcript files are automatically cleaned up. When the file count exceeds transcript_max_files, the oldest transcripts (and their sidecars) are deleted on each spawn or resume.

Transcript Configuration

Configure transcript behavior in the [agents] section of config.toml:

[agents]
# Enable or disable transcript recording (default: true).
# When false, no transcript files are written and /agent resume is unavailable.
transcript_enabled = true

# Directory for transcript files (default: .zeph/subagents).
# transcript_dir = ".zeph/subagents"

# Maximum number of .jsonl files to keep (default: 50).
# Oldest files are deleted when the count exceeds this limit.
# Set to 0 for unlimited (no cleanup).
transcript_max_files = 50

Writing Definitions

A definition is a markdown file with YAML frontmatter between --- delimiters. The body after the closing --- becomes the sub-agent’s system prompt.

Note: Prior to v0.13, definitions used TOML frontmatter (+++). That format is still accepted but deprecated and will be removed in v1.0.0. Migrate by replacing +++ delimiters with --- and converting the body to YAML syntax.

Minimal Definition

Only name and description are required. Everything else has sensible defaults:

---
name: helper
description: General-purpose helper
---

You are a helpful assistant. Complete the given task concisely.

Full Definition

---
name: code-reviewer
description: Reviews code changes for correctness and style
model: claude-sonnet-4-20250514
background: false
max_turns: 10
memory: project
tools:
  allow:
    - shell
    - web_scrape
  except:
    - shell_sudo
permissions:
  permission_mode: accept_edits
  secrets:
    - github-token
  timeout_secs: 300
  ttl_secs: 120
skills:
  include:
    - "git-*"
    - "rust-*"
  exclude:
    - "deploy-*"
hooks:
  PreToolUse:
    - matcher: "Bash"
      hooks:
        - type: command
          command: "./scripts/validate.sh"
  PostToolUse:
    - matcher: "Edit|Write"
      hooks:
        - type: command
          command: "./scripts/lint.sh"
---

You are a code reviewer. Analyze the provided code for:
- Correctness bugs
- Performance issues
- Idiomatic Rust style

Report findings as a structured list with severity (critical/warning/info).

Field Reference

FieldTypeDefaultDescription
namestringrequiredUnique identifier
descriptionstringrequiredHuman-readable description
modelstringinheritedLLM model override
backgroundboolfalseRun as a background task; secret requests are auto-denied inline
max_turnsu3220Maximum LLM turns before the agent is stopped
memorystringPersistent memory scope: user, project, or local (see Persistent Memory)
tools.allowstring[]Only these tools are available (mutually exclusive with deny)
tools.denystring[]All tools except these (mutually exclusive with allow)
tools.exceptstring[][]Additional denylist applied on top of allow/deny; deny always wins over allow; exact match on tool ID
permissions.permission_modeenumdefaultTool call approval policy (see below)
permissions.secretsstring[][]Vault keys the agent MAY request
permissions.timeout_secsu64600Hard kill deadline
permissions.ttl_secsu64300TTL for granted permissions
skills.includestring[]allGlob patterns to include (* wildcard)
skills.excludestring[][]Glob patterns to exclude (takes precedence)
hooks.PreToolUseHookMatcher[][]Hooks fired before tool execution (see Hooks)
hooks.PostToolUseHookMatcher[][]Hooks fired after tool execution (see Hooks)

If neither tools.allow nor tools.deny is specified, the sub-agent inherits all tools from the main agent.

permission_mode Values

ValueDescription
defaultStandard interactive prompts — the user is asked before each sensitive tool call
accept_editsFile edit and write operations are auto-accepted without prompting
dont_askAll tool calls are auto-approved without any prompt
bypass_permissionsSame as dont_ask but emits a warning at definition load time
planThe agent can see the tool catalog but cannot execute any tools; produces text-only output

Caution

bypass_permissions skips all tool-call approval prompts. Only use it in fully trusted, sandboxed environments.

Tip

Use plan mode when you only need a structured action plan from the agent and want to review it before any tools are executed.

tools.except — Additional Denylist

tools.except lets you block specific tool IDs regardless of what allow or deny says. Deny always wins over allow, so a tool listed in both allow and except is blocked.

tools:
  allow:
    - shell
    - web_scrape
  except:
    - shell_sudo    # blocked even though shell is in allow

Use except to tighten an existing allow list without rewriting it.

background — Fire-and-Forget Execution

When background: true, the agent runs without blocking the conversation. Secret requests that would normally open an interactive prompt are auto-denied inline instead, so the main session is never paused waiting for user input.

---
name: nightly-linter
description: Runs cargo clippy on the workspace nightly
background: true
max_turns: 5
tools:
  allow:
    - shell
---

Run `cargo clippy --workspace -- -D warnings` and report any new warnings introduced since the last run.

Results appear in /agent status and the TUI panel when the task completes.

max_turns — Turn Limit

max_turns caps the number of LLM turns the agent may take. The agent is stopped automatically when the limit is reached, preventing runaway inference loops.

---
name: summarizer
description: Summarizes long documents
max_turns: 3
---

Summarize the provided content in three bullet points.

The default is 20. Set a lower value for narrow, well-defined tasks.

Definition Locations

PathScopePriority
.zeph/agents/ProjectHigher (wins on name conflict)
~/.config/zeph/agents/User (global)Lower

Managing Definitions

Use the zeph agents subcommand to list, inspect, create, edit, and delete sub-agent definitions from the command line.

List

$ zeph agents list
NAME             SCOPE                    DESCRIPTION                       MODEL
code-reviewer    project/code-reviewer…   Reviews code for correctness      claude-sonnet-4-20250514
test-writer      user/test-writer.md      Generates unit tests              -

Show

$ zeph agents show code-reviewer
Name:        code-reviewer
Description: Reviews code for correctness
Source:      project/code-reviewer.md
Model:       claude-sonnet-4-20250514
Mode:        Default
Max turns:   10
Background:  false
Tools:       allow ["shell", "web_scrape"]

System prompt:
You are a code reviewer...

Create

$ zeph agents create reviewer --description "Code review helper"
Created .zeph/agents/reviewer.md

$ zeph agents create reviewer --description "Code review helper" --model claude-sonnet-4-20250514
Created .zeph/agents/reviewer.md

$ zeph agents create reviewer --description "Global helper" --dir ~/.config/zeph/agents/
Created /Users/you/.config/zeph/agents/reviewer.md

Options:

  • --description / -d — short description (required)
  • --model — model override (optional)
  • --dir — target directory (default: .zeph/agents/)

Edit

Opens the definition file in $VISUAL or $EDITOR (falls back to vi). After the editor closes, Zeph re-parses the file to validate it:

$ zeph agents edit reviewer
# $EDITOR opens .zeph/agents/reviewer.md
Updated /path/to/.zeph/agents/reviewer.md

Delete

$ zeph agents delete reviewer
Delete /path/to/.zeph/agents/reviewer.md? [y/N] y
Deleted reviewer

Use --yes / -y to skip the confirmation prompt.

TUI Panel

The TUI command palette (/) includes agents:* entries. Select one to open the agent manager overlay or populate the input bar with the corresponding /agent command. Open the overlay directly by typing /agents in the command palette and selecting agents:list.

The agent manager overlay provides keyboard navigation over all loaded definitions:

KeyAction
j / k or arrowsNavigate list
EnterOpen detail view
cCreate new definition (wizard form)
e (in detail view)Edit via form
d (in detail view)Delete with confirmation
EscGo back / close panel

Note: The TUI wizard edits name, description, model, and max_turns fields only. To edit hooks, memory, skills, or the system prompt, use zeph agents edit with $EDITOR.

Saving via the TUI form rewrites the file and removes YAML comments. Use the CLI edit command to preserve hand-written formatting.

Persistent Memory

Sub-agents can maintain persistent state across sessions via a MEMORY.md file and topic-specific files in a dedicated memory directory. This lets agents build knowledge over time without starting from scratch on every spawn.

Enabling Memory

Add the memory field to a definition’s YAML frontmatter:

---
name: code-reviewer
description: Reviews code for correctness and style
memory: project
---

Or set a global default in config.toml (applies to all agents without an explicit memory field):

[agents]
default_memory_scope = "project"

Memory Scopes

ScopeDirectoryUse Case
user~/.zeph/agent-memory/<name>/Cross-project memory shared between same-named agents. Do not store project-specific secrets here.
project.zeph/agent-memory/<name>/Project-scoped memory, suitable for version control.
local.zeph/agent-memory-local/<name>/Project-scoped but not committed. Add .zeph/agent-memory-local/ to .gitignore.

The memory directory is created automatically on first spawn. If the directory already exists, its contents are preserved.

How It Works

  1. Directory creation — At spawn time, Zeph creates the memory directory if it does not exist.
  2. MEMORY.md injection — The first 200 lines of MEMORY.md are loaded and injected into the system prompt after the behavioral prompt, wrapped in <agent-memory> tags. Lines beyond 200 are truncated with a pointer to the full file.
  3. File tool access — The agent uses Read, Write, and Edit tools to maintain MEMORY.md and create topic-specific files (e.g., patterns.md, debugging.md).
  4. Prompt ordering — The behavioral system prompt (from the definition body) always takes precedence over memory content.

Auto-Enabled File Tools

When an agent uses tools.allow (allowlist mode) and has memory enabled, Zeph automatically adds Read, Write, and Edit to the allowed tool list. A warning is logged so you know the tools were implicitly added:

WARN auto-enabled file tools for memory access — add ["Read", "Write", "Edit"]
     to tools.allow to suppress this warning

To silence the warning, explicitly include the file tools in your allowlist:

tools:
  allow:
    - shell
    - Read
    - Write
    - Edit

If all three file tools are blocked (via tools.except or tools.deny), memory is silently disabled — the directory is not created and no content is injected.

Security

  • Agent name validation — Names must match ^[a-zA-Z0-9][a-zA-Z0-9_-]{0,63}$. Path traversal attempts (e.g., ../etc/passwd) are rejected.
  • Symlink boundary checkMEMORY.md is canonicalized before reading. If the resolved path escapes the memory directory (e.g., via a symlink), the file is silently skipped.
  • Size cap — Files larger than 256 KiB are rejected.
  • Null byte guard — Files containing null bytes are rejected.
  • Tag escaping<agent-memory> tags in memory content are escaped to prevent prompt injection. Since MEMORY.md is agent-written (not user-written), this stricter escaping is applied by default.
  • Local scope .gitignore check — When using local scope, Zeph warns if .zeph/agent-memory-local/ is not in .gitignore.

Tool and Skill Access

Tool Filtering

Control which tools a sub-agent can use:

  • Allow list — only listed tools are available:
    tools:
      allow:
        - shell
        - web_scrape
    
  • Deny list — all tools except listed:
    tools:
      deny:
        - shell
    
  • Except list — additional block on top of allow or deny (deny always wins):
    tools:
      allow:
        - shell
        - web_scrape
      except:
        - shell_sudo
    
  • Inherit all — omit both allow and deny

Filtering is enforced at the executor level. The sub-agent’s LLM only sees tool definitions it can actually call. Blocked tool calls return an error.

Skill Filtering

Skills are filtered by glob patterns with * wildcard:

skills:
  include:
    - "git-*"
    - "rust-*"
  exclude:
    - "deploy-*"
  • Empty include = all skills pass (unless excluded)
  • exclude always takes precedence over include

Security Model

Sub-agents follow a zero-trust principle: they start with zero permissions and can only access what you explicitly grant.

How It Works

  1. Definitions declare capabilities, not permissions. Writing secrets: [github-token] means the agent may request that secret — it doesn’t get it automatically.

  2. Secrets require your approval. When a sub-agent needs a secret, Zeph prompts you:

    Sub-agent ‘code-reviewer’ requests ‘github-token’ (TTL: 120s). Allow? [y/n]

  3. Everything expires. Granted permissions and secrets are automatically revoked after ttl_secs or when the sub-agent finishes — whichever comes first.

  4. Secrets stay in memory only. They are never written to disk, message history, or logs.

Permission Lifecycle

stateDiagram-v2
    [*] --> Request
    Request --> UserApproval
    UserApproval --> Denied
    UserApproval --> Grant: approved (with TTL)
    Grant --> Active
    Active --> Expired
    Active --> Revoked
    Expired --> [*]: cleared from memory
    Revoked --> [*]: cleared from memory
    Denied --> [*]

Safety Guarantees

  • Concurrency limit prevents resource exhaustion
  • permissions.timeout_secs provides a hard kill deadline
  • max_turns prevents runaway LLM loops
  • Background agents auto-deny secret requests so the main session is never blocked
  • All grants are revoked on completion, cancellation, or crash
  • Secret key names are redacted in logs

Hooks

Hooks let you run shell commands at specific points in a sub-agent’s lifecycle. Use them to validate tool inputs, run linters after file edits, set up resources on agent start, or clean up on agent stop.

There are two hook scopes:

  • Per-agent hooks — defined in the agent’s YAML frontmatter, scoped to tool use events (PreToolUse, PostToolUse)
  • Config-level hooks — defined in config.toml, scoped to agent lifecycle events (SubagentStart, SubagentStop)

Per-Agent Hooks (PreToolUse / PostToolUse)

Add a hooks section to the agent’s YAML frontmatter. Each event contains a list of matchers, and each matcher specifies which tools it applies to and what commands to run:

---
name: code-reviewer
description: Reviews code for correctness and style
hooks:
  PreToolUse:
    - matcher: "Bash"
      hooks:
        - type: command
          command: "./scripts/validate.sh"
          timeout_secs: 10
          fail_closed: true
  PostToolUse:
    - matcher: "Edit|Write"
      hooks:
        - type: command
          command: "./scripts/lint.sh"
---

PreToolUse fires before a tool is executed. Set fail_closed: true to block execution if the hook exits non-zero.

PostToolUse fires after a tool finishes. Useful for linting, formatting, or auditing changes.

Matcher Syntax

The matcher field is a pipe-separated list of tokens. A tool matches when its name contains any of the listed tokens (case-sensitive substring match):

MatcherMatchesDoes not match
"Bash"BashEdit, Write
"Edit|Write"Edit, WriteFileBash, Read
"Shell"Shell, ShellExecBash

Hook Definition Fields

FieldTypeDefaultDescription
typestringrequiredHook type — currently only "command" is supported
commandstringrequiredShell command to execute (passed to sh -c)
timeout_secsu6430Maximum execution time before the hook is killed
fail_closedboolfalseWhen true, a non-zero exit or timeout causes the calling operation to fail; when false, errors are logged and execution continues

Config-Level Hooks (SubagentStart / SubagentStop)

Define lifecycle hooks in config.toml under [agents.hooks]. These run for every sub-agent:

[agents.hooks]

[[agents.hooks.start]]
type = "command"
command = "echo agent started"
timeout_secs = 10

[[agents.hooks.stop]]
type = "command"
command = "./scripts/cleanup.sh"

start hooks fire after a sub-agent is spawned. stop hooks fire after a sub-agent finishes or is cancelled. Both are fire-and-forget — errors are logged but do not affect the agent’s operation.

Common use cases:

HookUse case
startSend a Slack/webhook notification that a sub-agent started; initialize a working directory; write a lock file
stopPost results to a dashboard; remove temp files; log task duration

Each hook definition accepts the same fields as per-agent hooks:

FieldTypeDefaultDescription
typestringrequiredCurrently only "command" is supported
commandstringrequiredShell command executed via sh -c
timeout_secsu6430Hook is killed after this many seconds
fail_closedboolfalseWhen true, a non-zero exit blocks the operation; when false, errors are logged and execution continues

Multiple hooks per event are supported — they run sequentially in definition order:

[agents.hooks]

[[agents.hooks.start]]
type = "command"
command = "curl -s -X POST https://hooks.example.com/agent-start -d agent=$ZEPH_AGENT_NAME"
timeout_secs = 5

[[agents.hooks.start]]
type = "command"
command = "mkdir -p /tmp/zeph-work/$ZEPH_AGENT_ID"
timeout_secs = 5

[[agents.hooks.stop]]
type = "command"
command = "rm -rf /tmp/zeph-work/$ZEPH_AGENT_ID"
timeout_secs = 10

Environment Variables

Hook processes receive a clean environment with only the PATH variable preserved from the parent process. The following Zeph-specific variables are set:

VariableDescription
ZEPH_AGENT_IDUUID of the sub-agent instance
ZEPH_AGENT_NAMEName from the agent definition
ZEPH_TOOL_NAMETool name (only for PreToolUse / PostToolUse)

Security

Hooks follow a trust-boundary model:

  • Project-level definitions (.zeph/agents/) may contain hooks — they are trusted because they live in the project repository.
  • User-level definitions (~/.config/zeph/agents/) have all hooks stripped on load. This prevents untrusted global definitions from running arbitrary commands in any project.
  • Hook processes run with a cleared environment (env_clear()). Only PATH is preserved from the parent to prevent accidental secret leakage.
  • Child processes are explicitly killed on timeout to prevent orphan processes.

Note: If you need hooks on a globally shared agent, move the definition into the project’s .zeph/agents/ directory instead.

Global Agent Defaults

The [agents] section in config.toml sets defaults that apply to all sub-agents unless overridden by the individual definition:

[agents]
# Default permission mode for sub-agents that do not set one explicitly.
# "default" and omitting this field are equivalent — both result in standard
# interactive prompts.
# Valid values: "default", "accept_edits", "dont_ask"
# (bypass_permissions and plan are not useful as global defaults)
default_permission_mode = "default"

# Tool IDs blocked for all sub-agents, regardless of what their definition allows.
# Appended on top of any per-definition tool filtering.
default_disallowed_tools = []

# Must be true to allow any sub-agent definition to use bypass_permissions mode.
# When false (the default), spawning a definition with permission_mode: bypass_permissions
# is rejected at load time with an error.
allow_bypass_permissions = false

# Enable JSONL transcript recording for sub-agent sessions (default: true).
# When false, /agent resume is unavailable.
transcript_enabled = true

# Directory for transcript files (default: .zeph/subagents).
# transcript_dir = ".zeph/subagents"

# Maximum number of transcript files to keep (default: 50).
# Set to 0 for unlimited.
transcript_max_files = 50

# Default memory scope for agents that do not set `memory` in their frontmatter.
# Valid values: "user", "project", "local"
# Omit or set to null to disable memory by default.
# default_memory_scope = "project"

# Lifecycle hooks — run for every sub-agent start/stop.
# See the Hooks section above for the full schema.
# [agents.hooks]
# [[agents.hooks.start]]
# type = "command"
# command = "echo started"
# [[agents.hooks.stop]]
# type = "command"
# command = "./scripts/cleanup.sh"

Note: default_permission_mode = "default" and omitting the field are equivalent — both leave per-agent prompting behavior unchanged.

Caution: Set allow_bypass_permissions = true only in fully trusted, sandboxed environments. Without this flag, any definition requesting bypass_permissions mode is rejected at load time.

Context Propagation

Sub-agents inherit context from the parent agent to reduce cold-start overhead:

  • Conversation history: the parent’s recent conversation history is forwarded to the sub-agent’s initial context, giving it awareness of what has been discussed
  • Cancellation: the parent’s cancellation token is propagated so that cancelling the parent also cancels active sub-agents
  • Model inheritance: sub-agents inherit the parent’s active model unless overridden in the definition’s model field

Sub-agents no longer exit after a single text-only LLM response — they continue the conversation loop until the task is complete or max_turns is reached.

Sub-Agent Context Injection

context_injection_mode controls exactly how parent conversation history is injected into the sub-agent’s task prompt. Configure it globally under [agents]:

[agents]
context_window_turns   = 10   # recent parent turns forwarded to the sub-agent
context_injection_mode = "last_assistant_turn"  # default
ModeBehavior
noneNo parent context injected. The sub-agent starts with only its system prompt and the task string. Use for fully isolated workers where parent history would be noise.
last_assistant_turnThe last assistant turn from the parent history is prepended to the task prompt as a preamble (default). Gives the sub-agent single-turn awareness — the most recent state — at zero extra LLM cost.
summaryA compact LLM-generated summary of the recent parent turns is injected. Suitable for long multi-turn sessions where full history injection would consume too many tokens. Requires a provider to generate the summary.

context_window_turns limits how many parent turns are forwarded regardless of mode. Set to 0 to disable history propagation entirely (equivalent to none but affects all modes uniformly).

Model inheritance: sub-agents use the parent’s active provider unless the definition’s model field specifies an override. This means a sub-agent spawned during a gpt-5.4 session will use gpt-5.4 unless pinned to a different model in the definition.

MCP Tool Awareness

Sub-agent system prompts are automatically annotated with the names of available MCP tools from connected servers. This helps the sub-agent’s LLM understand what external capabilities are available without injecting full tool schemas.

Interactive TUI Sidebar

When the tui feature is enabled, pressing Tab in Normal mode cycles to the sub-agent sidebar. The sidebar provides:

  • Live status for all active sub-agents with color-coded indicators
  • A transcript viewer that shows the full conversation history of a selected sub-agent
  • Keyboard navigation: j/k to select agents, Enter to open the transcript, Esc to close

TUI Dashboard Panel

When the tui feature is enabled, a Sub-Agents panel appears in the sidebar showing active agents with color-coded status:

┌ Sub-Agents (2) ─────────────────────────┐
│  code-reviewer [plan]  WORKING  3/20  42s │
│  test-writer [bg] [bypass!]  COMPLETED 10/20  100s │
└─────────────────────────────────────────┘

Colors: yellow = working, green = completed, red = failed, cyan = input required.

Permission mode badges: [plan], [accept_edits], [dont_ask], [bypass!]. The default mode shows no badge.

Architecture

Sub-agents run as in-process tokio tasks — not separate processes. The main agent communicates with them via lightweight primitives:

sequenceDiagram
    participant M as SubAgentManager
    participant S as Sub-Agent (tokio task)
    M->>S: tokio::spawn(run_agent_loop)
    S-->>M: watch::send(Working)
    S-->>M: watch::send(Working, msg)
    M->>S: CancellationToken::cancel()
    S-->>M: watch::send(Completed)
    S-->>M: JoinHandle.await → Result
PrimitiveDirectionPurpose
watch::channelAgent → ManagerReal-time status updates
JoinHandleAgent → ManagerFinal result collection
CancellationTokenManager → AgentGraceful cancellation

@mention vs File References

The TUI uses @ for both sub-agent mentions and file references. Zeph resolves ambiguity by checking the token after @ against known agent names:

@code-reviewer review src/main.rs   → sub-agent mention
@src/main.rs                        → file reference

API Reference

For programmatic use, SubAgentManager provides the full lifecycle API:

#![allow(unused)]
fn main() {
let mut manager = SubAgentManager::new(/* max_concurrent */ 4);

manager.load_definitions(&[
    project_dir.join(".zeph/agents"),
    dirs::config_dir().unwrap().join("zeph/agents"),
])?;

let task_id = manager.spawn("code-reviewer", "Review src/main.rs", provider, executor, None)?;
let statuses = manager.statuses();
manager.cancel(&task_id)?;
let result = manager.collect(&task_id).await?;
}
MethodDescription
load_definitions(&[PathBuf])Load .md definitions (first-wins deduplication)
spawn(name, prompt, provider, executor, skills)Spawn a sub-agent, returns task ID
cancel(task_id)Cancel and revoke all grants
collect(task_id)Await result and remove from active set
statuses()Snapshot of all active sub-agent states
approve_secret(task_id, key, ttl)Grant a vault secret after user approval
shutdown_all()Cancel all active sub-agents (used on exit)

Error Types

VariantWhen
ParseInvalid frontmatter or YAML/TOML
InvalidValidation failure (empty name, mutual exclusion)
NotFoundUnknown definition name or task ID
SpawnConcurrency limit reached or task panic
CancelledSub-agent was cancelled

Background Lifecycle (Phase 5 — Planned)

Planned — The features in this section are part of Phase 5 (#1145) and not yet available.

Phase 5 closes the gap between fire-and-forget background agents and a full lifecycle model with timeout enforcement, result persistence, completion notifications, and new CLI commands for inspecting agent output.

Timeout Enforcement

Planned — This feature is part of Phase 5 (#1145) and not yet available.

The permissions.timeout_secs field is currently parsed from agent definitions but not enforced at runtime. A runaway background agent can consume resources indefinitely.

Phase 5 wraps the agent loop in tokio::time::timeout so agents are killed when the deadline expires:

#![allow(unused)]
fn main() {
let timeout_dur = Duration::from_secs(def.permissions.timeout_secs);
let join_handle = tokio::spawn(async move {
    match tokio::time::timeout(timeout_dur, run_agent_loop(args)).await {
        Ok(result) => result,
        Err(_elapsed) => {
            tracing::warn!("sub-agent timed out after {timeout_dur:?}");
            Err(anyhow::anyhow!("sub-agent timed out after {}s", timeout_dur.as_secs()))
        }
    }
});
}

The default timeout is 600 seconds (10 minutes). Override it per agent:

---
name: long-running-task
description: Agent with a custom timeout
permissions:
  timeout_secs: 1800  # 30 minutes
---

Timeout is wall-clock time, independent of max_turns. Both limits are enforced simultaneously — whichever fires first stops the agent.

Completion Notifications

Planned — This feature is part of Phase 5 (#1145) and not yet available.

Currently the parent agent must poll /agent status to discover when a background agent finishes. Phase 5 introduces a CompletionEvent that fires when any agent reaches a terminal state (completed, failed, cancelled, or timed out):

#![allow(unused)]
fn main() {
pub struct CompletionEvent {
    pub task_id: String,
    pub agent_name: String,
    pub state: SubAgentState,
    pub elapsed: Duration,
}
}

The event carries only metadata — no result summary. Consumers read the full output from the persisted output file or SQLite table.

Delivery uses a cooperative sweep-on-access model rather than a background task. The manager’s reap_completed() method is called from the agent loop, collects all finished handles, persists results, and returns completion events. This avoids shared-ownership complexity since SubAgentManager is not behind Arc<Mutex>.

Result Persistence

Planned — This feature is part of Phase 5 (#1145) and not yet available.

Background agent results are currently ephemeral — stored as in-memory strings, lost if not explicitly collected or on process exit. Phase 5 adds dual persistence:

Output files — The final result is written to .zeph/agent-output/<task_id>.txt with a 1 MiB cap and 24-hour retention. Files are cleaned up by the reaper on the next sweep.

SQLite table — A background_results table stores structured metadata:

CREATE TABLE IF NOT EXISTS background_results (
    task_id     TEXT PRIMARY KEY,
    agent_name  TEXT NOT NULL,
    success     INTEGER NOT NULL,
    result_text TEXT NOT NULL,
    turns_used  INTEGER NOT NULL,
    elapsed_ms  INTEGER NOT NULL,
    created_at  TEXT NOT NULL DEFAULT (datetime('now'))
);

Configure persistence in config.toml:

[agents]
output_dir = ".zeph/agent-output"       # default
output_retention_secs = 86400           # 24h, default
output_max_bytes = 1048576              # 1 MiB, default

New CLI Commands

Planned — This feature is part of Phase 5 (#1145) and not yet available.

CommandDescription
/agent output <id>Print the persisted output file for a completed agent
/agent collect <id>Collect a specific agent’s result
/agent collectCollect all completed agents at once

/agent collect without arguments collects all agents in a terminal state (completed, failed, timed out). Active agents are skipped — the command never blocks waiting for a running agent to finish. /agent collect <id> collects a specific agent by ID prefix.

Example workflow:

> /agent bg code-reviewer Review the auth module
Sub-agent 'code-reviewer' started (id: a1b2c3d4)

> /agent status
Active sub-agents:
  [a1b2c3d4] completed  turns=5  elapsed=38s

> /agent output a1b2
--- Output for a1b2c3d4 (code-reviewer) ---
Found 2 issues in the auth module:
1. [critical] Token expiry check missing in refresh_token()
2. [warning] Redundant clone on line 42
---

> /agent collect
Collected 1 completed agent(s).

Structured Result Type

Planned — This feature is part of Phase 5 (#1145) and not yet available.

The current run_agent_loop returns a raw String. Phase 5 replaces it with a structured AgentResult:

#![allow(unused)]
fn main() {
pub struct AgentResult {
    pub final_response: String,
    pub conversation: Vec<Message>,  // full message history
    pub turns_used: u32,
    pub elapsed: Duration,
    pub timed_out: bool,
}
}

This enables /agent output to show the full result, and collect() to return structured data for programmatic use. The JoinHandle type changes from Result<String> to Result<AgentResult>.

Progress Streaming

Planned — This feature is part of Phase 5 (#1145) and not yet available.

The last_message field in SubAgentStatus is currently truncated to 120 characters, providing minimal visibility into agent progress. Phase 5 makes two improvements:

  1. Increased truncation limitlast_message truncation increases from 120 to 500 characters for immediate benefit without breaking changes.

  2. Dedicated progress channel — A separate mpsc::Sender<ProgressUpdate> channel carries full per-turn output alongside the existing watch channel:

#![allow(unused)]
fn main() {
pub struct ProgressUpdate {
    pub turn: u32,
    pub content: String,            // full LLM response for this turn
    pub tool_output: Option<String>, // tool result if applicable
}
}

The watch channel remains for lightweight status polling (no breaking change to SubAgentStatus). The progress channel has a capacity of 32 messages — unread messages are dropped when the buffer is full to prevent OOM.

Access progress updates via SubAgentManager::drain_progress(task_id) -> Vec<ProgressUpdate>.

Hook Improvements

Planned — This feature is part of Phase 5 (#1145) and not yet available.

Phase 5 adds a new environment variable to SubagentStop hooks:

VariableDescription
ZEPH_AGENT_EXIT_REASONExit reason: completed, failed, canceled, or timed_out

This allows stop hooks to take different actions based on how the agent ended — for example, sending a notification only on failure or cleaning up resources only on timeout.

Phase 5 also fixes a bug where SubagentStop hooks fire twice when a running agent is cancelled and then collected. The fix ensures the hook fires exactly once at the first terminal state transition.