Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Configuration Reference

Complete reference for the Zeph configuration file and environment variables. For the interactive setup wizard, see Configuration Wizard.

Config File Resolution

Zeph loads config/default.toml at startup and applies environment variable overrides.

# CLI argument (highest priority)
zeph --config /path/to/custom.toml

# Environment variable
ZEPH_CONFIG=/path/to/custom.toml zeph

# Default (fallback)
# config/default.toml

Priority: --config > ZEPH_CONFIG > config/default.toml.

Validation

Config::validate() runs at startup and rejects out-of-range values:

FieldConstraint
memory.history_limit<= 10,000
memory.context_budget_tokens<= 1,000,000 (when > 0)
memory.soft_compaction_threshold0.0–1.0, must be < hard_compaction_threshold
memory.hard_compaction_threshold0.0–1.0, must be > soft_compaction_threshold
memory.graph.temporal_decay_ratefinite, in [0.0, 10.0]; NaN and Inf rejected at deserialization
memory.compression.threshold_tokens>= 1,000 (proactive only)
memory.compression.max_summary_tokens>= 128 (proactive only)
memory.compression.probe.threshold(0.0, 1.0], must be > hard_fail_threshold
memory.compression.probe.hard_fail_threshold[0.0, 1.0), must be < threshold
memory.compression.probe.max_questions>= 1
memory.compression.probe.timeout_secs>= 1
memory.semantic.importance_weightfinite, in [0.0, 1.0]
memory.graph.spreading_activation.decay_lambdain (0.0, 1.0]
memory.graph.spreading_activation.max_hops>= 1
memory.graph.spreading_activation.activation_threshold< inhibition_threshold
memory.graph.spreading_activation.inhibition_threshold> activation_threshold
memory.graph.spreading_activation.seed_structural_weightin [0.0, 1.0]
memory.graph.note_linking.link_weight_decay_lambdafinite, in (0.0, 1.0]
llm.semantic_cache_thresholdfinite, in [0.0, 1.0]
orchestration.plan_cache.similarity_thresholdin [0.5, 1.0]
orchestration.plan_cache.max_templatesin [1, 10000]
orchestration.plan_cache.ttl_daysin [1, 365]
memory.token_safety_margin> 0.0
agent.max_tool_iterations<= 100
a2a.rate_limit> 0
acp.max_sessions> 0
acp.session_idle_timeout_secs> 0
acp.permission_filevalid file path (optional)
acp.lsp.request_timeout_secs> 0
gateway.rate_limit> 0
gateway.max_body_size<= 10,485,760 (10 MiB)

Hot-Reload

Zeph watches the config file for changes and applies runtime-safe fields without restart (500ms debounce).

Reloadable fields:

SectionFields
[security]redact_secrets
[timeouts]llm_seconds, embedding_seconds, a2a_seconds
[memory]history_limit, summarization_threshold, context_budget_tokens, soft_compaction_threshold, hard_compaction_threshold, compaction_preserve_tail, prune_protect_tokens, cross_session_score_threshold
[memory.semantic]recall_limit
[index]repo_map_ttl_secs, watch
[agent]max_tool_iterations
[skills]max_active_skills

Not reloadable (require restart): LLM provider/model, SQLite path, Qdrant URL, vector backend, Telegram token, MCP servers, A2A config, ACP config (including [acp.lsp]), agents config, skill paths, LSP context injection config ([agent.lsp]), compaction probe config ([memory.compression.probe]).

Breaking change (v0.17.0): The old [llm.cloud], [llm.orchestrator], and [llm.router] config sections have been removed. Run zeph --migrate-config to automatically convert your config file.

Configuration File

[agent]
name = "Zeph"
max_tool_iterations = 10  # Max tool loop iterations per response (default: 10)
auto_update_check = true  # Query GitHub Releases API for newer versions (default: true)

[agent.instructions]
auto_detect    = true    # Auto-detect provider-specific files: CLAUDE.md, AGENTS.md, GEMINI.md (default: true)
extra_files    = []      # Additional instruction files (absolute or relative to cwd)
max_size_bytes = 262144  # Per-file size cap in bytes (default: 256 KiB)
# zeph.md and .zeph/zeph.md are always loaded regardless of auto_detect.
# Use --instruction-file <path> at the CLI to supply extra files at startup.

# LSP context injection — requires lsp-context feature and mcpls MCP server.
# Enable with --lsp-context CLI flag or by setting enabled = true here.
# [agent.lsp]
# enabled = false                # Enable LSP context injection hooks (default: false)
# mcp_server_id = "mcpls"       # MCP server ID providing LSP tools (default: "mcpls")
# token_budget = 2000            # Max tokens to spend on injected LSP context per turn (default: 2000)
#
# [agent.lsp.diagnostics]
# enabled = true                 # Inject diagnostics after write_file (default: true when agent.lsp is enabled)
# max_per_file = 20              # Max diagnostics per file (default: 20)
# max_files = 5                  # Max files per injection batch (default: 5)
# min_severity = "error"         # Minimum severity: "error", "warning", "info", or "hint" (default: "error")
#
# [agent.lsp.hover]
# enabled = false                # Pre-fetch hover info after read_file (default: false)
# max_symbols = 10               # Max symbols to fetch hover for per file (default: 10)
#
# [agent.lsp.references]
# enabled = true                 # Inject reference list before rename_symbol (default: true)
# max_refs = 50                  # Max references to show per symbol (default: 50)

[agent.learning]
correction_detection = true           # Enable implicit correction detection (default: true)
correction_confidence_threshold = 0.7 # Jaccard token overlap threshold for correction candidates (default: 0.7)
correction_recall_limit = 3           # Max corrections injected into system prompt (default: 3)
correction_min_similarity = 0.75      # Min cosine similarity for correction recall from Qdrant (default: 0.75)

[llm]
# routing = "none"      # none (default), ema, thompson, cascade, task, triage
# router_ema_enabled = false         # EMA-based provider latency routing (default: false)
# router_ema_alpha = 0.1             # EMA smoothing factor, 0.0–1.0 (default: 0.1)
# router_reorder_interval = 10       # Re-order providers every N requests (default: 10)
# thompson_state_path = "~/.zeph/router_thompson_state.json"  # Thompson state persistence path
# response_cache_enabled = false     # SQLite-backed LLM response cache (default: false)
# response_cache_ttl_secs = 3600     # Cache TTL in seconds (default: 3600)
# semantic_cache_enabled = false     # Embedding-based similarity cache (default: false)
# semantic_cache_threshold = 0.95    # Cosine similarity for cache hit (default: 0.95)
# semantic_cache_max_candidates = 10 # Max entries to examine per lookup (default: 10)

# Dedicated provider for tool-pair summarization and context compaction (optional).
# String shorthand — pick one format, or use [llm.summary_provider] below.
# summary_model = "ollama/qwen3:1.7b"              # ollama/<model>
# summary_model = "claude"                         # Claude, model from the claude provider entry
# summary_model = "claude/claude-haiku-4-5-20251001"
# summary_model = "openai/gpt-4o-mini"
# summary_model = "compatible/<name>"              # [[llm.providers]] entry name for compatible type
# summary_model = "candle"

# Structured summary provider. Takes precedence over summary_model when both are set.
# [llm.summary_provider]
# type = "claude"                        # claude, openai, compatible, ollama, candle
# model = "claude-haiku-4-5-20251001"   # model override
# base_url = "..."                       # endpoint override (ollama / openai only)
# embedding_model = "..."               # embedding model override (ollama / openai only)
# device = "cpu"                         # cpu, cuda, metal (candle only)

# Cascade routing options (when routing = "cascade").
# [llm.cascade]
# quality_threshold = 0.5             # Score below which response is degenerate (default: 0.5)
# max_escalations = 2                 # Max escalation steps per request (default: 2)
# classifier_mode = "heuristic"       # "heuristic" (default) or "judge" (LLM-backed)
# max_cascade_tokens = 0              # Cumulative token cap across escalation levels; 0 = unlimited
# cost_tiers = ["ollama", "claude"]   # Explicit cost ordering (cheapest first)

# Quality gate for Thompson/EMA routing — post-selection embedding similarity check.
# quality_gate = 0.0    # Cosine threshold; 0.0 = disabled (default: 0.0). Applies to thompson/ema only.

# ASI coherence tracking — penalizes providers with low response coherence.
# [llm.routing.asi]
# enabled             = false
# window_size         = 10      # Sliding window of response embeddings per provider (default: 10)
# coherence_threshold = 0.5     # Warn when rolling mean drops below this (default: 0.5)
# penalty_weight      = 0.3     # Multiplier applied to Thompson/EMA scores (default: 0.3)
# embedding_provider  = ""      # Provider name for response embeddings; empty = primary

# Complexity triage routing options (when routing = "triage").
# [llm.complexity_routing]
# triage_provider = "fast"            # Provider name used for classification (required)
# bypass_single_provider = true       # Skip triage when all tiers map to the same provider (default: true)
# triage_timeout_secs = 5             # Triage call timeout; falls back to simple tier on expiry (default: 5)
# max_triage_tokens = 50              # Max tokens in triage response (default: 50)
# fallback_strategy = "cascade"       # Optional hybrid mode: triage + quality escalation ("cascade" only)
#
# [llm.complexity_routing.tiers]
# simple  = "fast"                    # Provider name for trivial requests; also used as triage fallback
# medium  = "default"                 # Provider name for moderate requests
# complex = "smart"                   # Provider name for multi-step / code-heavy requests
# expert  = "expert"                  # Provider name for research-grade requests

# Provider list — each [[llm.providers]] entry defines one LLM backend.
[[llm.providers]]
type = "ollama"                        # ollama, claude, openai, gemini, candle, compatible
# name = "local"                       # optional: identifier for multi-provider routing; required for compatible
base_url = "http://localhost:11434"
model = "qwen3:8b"
embedding_model = "qwen3-embedding"    # model for text embeddings
# vision_model = "llava:13b"          # Ollama only: dedicated model for image requests
# embed = true                         # mark as embedding provider for skill matching and semantic memory
# default = true                       # mark as primary chat provider
# embed_concurrency = 4                # Max concurrent embedding requests via semaphore (default: 4, 0 = unlimited)

# Additional provider examples:
# [[llm.providers]]
# name = "cloud"
# type = "claude"
# model = "claude-sonnet-4-6"
# max_tokens = 4096
# server_compaction = false            # Enable Claude server-side context compaction (compact-2026-01-12 beta)
# enable_extended_context = false      # Enable Claude 1M context window (context-1m-2025-08-07 beta, Sonnet/Opus 4.6)
# default = true

# [[llm.providers]]
# type = "openai"
# base_url = "https://api.openai.com/v1"
# model = "gpt-5.2"
# max_tokens = 4096
# embedding_model = "text-embedding-3-small"
# reasoning_effort = "medium"  # low, medium, high (for reasoning models)

# [[llm.providers]]
# type = "gemini"
# model = "gemini-2.0-flash"
# max_tokens = 8192
# embedding_model = "text-embedding-004"  # enable Gemini embeddings (optional)
# thinking_level = "medium"             # minimal, low, medium, high (Gemini 2.5+ only)
# thinking_budget = 8192               # token budget; -1 = dynamic, 0 = disabled (Gemini 2.5+ only)
# include_thoughts = true              # surface thinking chunks in TUI
# base_url = "https://generativelanguage.googleapis.com/v1beta"

# [[llm.providers]]
# name = "groq"
# type = "compatible"
# base_url = "https://api.groq.com/openai/v1"
# model = "llama-3.3-70b-versatile"
# max_tokens = 4096

[llm.stt]
provider = "whisper"
model = "whisper-1"
# base_url = "http://127.0.0.1:8080/v1"  # optional: OpenAI-compatible server
# language = "en"                          # optional: ISO-639-1 code or "auto"
# Requires `stt` feature. When base_url is set, targets a local server (no API key needed).
# When omitted, uses the OpenAI API key from the openai [[llm.providers]] entry or ZEPH_OPENAI_API_KEY.

[skills]
# Defaults to the user config dir when omitted
# (for example ~/.config/zeph/skills on Linux,
# ~/Library/Application Support/Zeph/skills on macOS,
# %APPDATA%\zeph\skills on Windows).
# paths = ["/absolute/path/to/skills"]
max_active_skills = 5              # Top-K skills per query via embedding similarity
disambiguation_threshold = 0.05    # LLM disambiguation when top-2 score delta < threshold (0.0 = disabled)
prompt_mode = "auto"               # Skill prompt format: "full", "compact", or "auto" (default: "auto")
cosine_weight = 0.7                # Cosine signal weight in BM25+cosine fusion (default: 0.7)
hybrid_search = false              # Enable BM25+cosine hybrid skill matching (default: false)

[skills.learning]
enabled = true                     # Enable self-learning skill improvement (default: true)
auto_activate = false              # Require manual approval for new versions (default: false)
min_failures = 3                   # Failures before triggering improvement (default: 3)
improve_threshold = 0.7            # Success rate below which improvement starts (default: 0.7)
rollback_threshold = 0.5           # Auto-rollback when success rate drops below this (default: 0.5)
min_evaluations = 5                # Minimum evaluations before rollback decision (default: 5)
max_versions = 10                  # Max auto-generated versions per skill (default: 10)
cooldown_minutes = 60              # Cooldown between improvements for same skill (default: 60)
detector_mode = "regex"            # Correction detector: "regex" (default) or "judge" (LLM-backed)
judge_model = ""                   # Model for judge calls; empty = use primary provider
judge_adaptive_low = 0.5           # Regex confidence below this bypasses judge (default: 0.5)
judge_adaptive_high = 0.8          # Regex confidence at/above this bypasses judge (default: 0.8)

[memory]
# Defaults to the user data dir when omitted
# (for example ~/.local/share/zeph/data/zeph.db on Linux,
# ~/Library/Application Support/Zeph/data/zeph.db on macOS,
# %LOCALAPPDATA%\Zeph\data\zeph.db on Windows).
# sqlite_path = "/absolute/path/to/zeph.db"
history_limit = 50
summarization_threshold = 100  # Trigger summarization after N messages
context_budget_tokens = 0      # 0 = unlimited (proportional split: 15% summaries, 25% recall, 60% recent)
soft_compaction_threshold = 0.60  # Soft tier: prune tool outputs + apply deferred summaries (no LLM); default: 0.60
hard_compaction_threshold = 0.90  # Hard tier: full LLM summarization when usage exceeds this fraction; default: 0.90
compaction_preserve_tail = 4   # Keep last N messages during compaction
prune_protect_tokens = 40000   # Protect recent N tokens from tool output pruning
cross_session_score_threshold = 0.35  # Minimum relevance for cross-session results
vector_backend = "qdrant"     # Vector store: "qdrant" (default) or "sqlite" (embedded)
sqlite_pool_size = 5          # SQLite connection pool size (default: 5)
response_cache_cleanup_interval_secs = 3600  # Interval for purging expired LLM response cache entries (default: 3600)
token_safety_margin = 1.0     # Multiplier for token budget safety margin (default: 1.0)
redact_credentials = true     # Scrub credential patterns from LLM context (default: true)
autosave_assistant = false    # Persist assistant responses to SQLite and embed (default: false)
autosave_min_length = 20      # Min content length for assistant embedding (default: 20)
tool_call_cutoff = 6          # Summarize oldest tool pair when visible pairs exceed this (default: 6)
# key_facts_dedup_threshold = 0.95  # Cosine similarity threshold for near-duplicate key_facts suppression (default: 0.95)

# Persona memory — extract and inject stable user preference and domain facts.
# [memory.persona]
# enabled                 = false
# persona_provider        = "fast"   # cheap extraction model; falls back to primary
# min_confidence          = 0.6      # facts below this are discarded (default: 0.6)
# min_messages            = 3        # minimum user messages before first extraction (default: 3)
# max_messages            = 10       # messages fed to LLM per extraction pass (default: 10)
# extraction_timeout_secs = 10       # timeout for extraction LLM call (default: 10)
# context_budget_tokens   = 500      # token budget for injected persona facts (default: 500)

# Trajectory memory — extract procedural/episodic entries from tool-call turns.
# [memory.trajectory]
# enabled                 = false
# trajectory_provider     = "fast"   # cheap extraction model; falls back to primary
# context_budget_tokens   = 400      # token budget for injected trajectory hints (default: 400)
# recall_top_k            = 5        # procedural entries retrieved per turn (default: 5)
# min_confidence          = 0.6      # entries below this are discarded (default: 0.6)
# max_messages            = 10       # messages fed to LLM per extraction pass (default: 10)
# extraction_timeout_secs = 10       # timeout for extraction LLM call (default: 10)

# Category-aware memory — tag messages with a category from active skill/tool context.
# [memory.category]
# enabled  = false
# auto_tag = true    # derive category from active skill or tool type automatically (default: true)

# TiMem temporal-hierarchical memory tree — hierarchical summary consolidation.
# [memory.tree]
# enabled                = false
# consolidation_provider = "fast"  # falls back to primary
# sweep_interval_secs    = 300     # background consolidation interval (default: 300)
# batch_size             = 20      # leaves processed per sweep (default: 20)
# similarity_threshold   = 0.8     # cosine threshold for clustering (default: 0.8)
# max_level              = 3       # maximum tree depth above leaves (default: 3)
# context_budget_tokens  = 400     # token budget for tree traversal in context (default: 400)
# recall_top_k           = 5       # nodes retrieved per turn (default: 5)
# min_cluster_size       = 2       # minimum cluster size to trigger LLM consolidation (default: 2)

# Time-based microcompact — clear stale low-value tool outputs after an idle gap.
# [memory.microcompact]
# enabled               = false
# gap_threshold_minutes = 60   # idle gap in minutes before clearing stale outputs (default: 60)
# keep_recent           = 3    # most recent low-value tool outputs to preserve (default: 3)

# autoDream — background memory consolidation after session-count and time gates pass.
# [memory.autodream]
# enabled                = false
# min_sessions           = 3     # sessions since last consolidation (default: 3)
# min_hours              = 24    # hours since last consolidation (default: 24)
# consolidation_provider = ""    # provider name; falls back to primary
# max_iterations         = 8     # safety bound for consolidation sweep (default: 8)

[memory.semantic]
enabled = false               # Enable semantic search via Qdrant
recall_limit = 5              # Number of semantically relevant messages to inject
temporal_decay_enabled = false        # Attenuate scores by message age (default: false)
temporal_decay_half_life_days = 30    # Half-life for temporal decay in days (default: 30)
mmr_enabled = false                   # MMR re-ranking for result diversity (default: false)
mmr_lambda = 0.7                      # MMR relevance-diversity trade-off, 0.0-1.0 (default: 0.7)
importance_enabled = false            # Write-time importance scoring for recall boost (default: false)
importance_weight = 0.15              # Blend weight for importance in ranking, [0.0, 1.0] (default: 0.15)

[memory.routing]
strategy = "heuristic"        # Routing strategy for memory backend selection (default: "heuristic")

# [memory.admission]
# enabled = false                    # Enable A-MAC adaptive memory admission control (default: false)
# threshold = 0.40                   # Composite score threshold; messages below this are rejected (default: 0.40)
# fast_path_margin = 0.15            # Admit immediately when score >= threshold + margin (default: 0.15)
# admission_provider = "fast"        # Provider for LLM-assisted admission decisions (optional, default: "")
# admission_strategy = "heuristic"   # "heuristic" (default) or "rl" (preview — falls back to heuristic)
# rl_min_samples = 500               # Training samples required before RL model activates (default: 500)
# rl_retrain_interval_secs = 3600    # Background RL retraining interval in seconds (default: 3600)
#
# [memory.admission.weights]
# future_utility = 0.30              # LLM-estimated future reuse probability (heuristic mode only)
# factual_confidence = 0.15          # Inverse of hedging markers
# semantic_novelty = 0.30            # 1 - max similarity to existing memories
# temporal_recency = 0.10            # Always 1.0 at write time
# content_type_prior = 0.15          # Role-based prior

[memory.compression]
strategy = "reactive"         # "reactive" (default) or "proactive"
# Proactive strategy fields (required when strategy = "proactive"):
# threshold_tokens = 80000   # Fire compression when context exceeds this token count (>= 1000)
# max_summary_tokens = 4000  # Cap for the compressed summary (>= 128)
# model = ""                 # Reserved — currently unused
# archive_tool_outputs = false  # Archive tool output bodies to SQLite before compaction (default: false)

[memory.compression.probe]
# enabled = false           # Enable compaction probe validation (default: false)
# model = ""                # Model for probe LLM calls; empty = summary provider (default: "")
# threshold = 0.6           # Minimum score for Pass verdict (default: 0.6)
# hard_fail_threshold = 0.35 # Score below this blocks compaction (default: 0.35)
# max_questions = 3         # Factual questions per probe (default: 3)
# timeout_secs = 15         # Timeout for both LLM calls in seconds (default: 15)

[memory.compression_guidelines]
enabled = false                # Enable failure-driven compression guidelines (default: false)
# update_threshold = 5        # Minimum unused failure pairs before triggering a guidelines update (default: 5)
# max_guidelines_tokens = 500 # Token budget for the guidelines document (default: 500)
# max_pairs_per_update = 10   # Failure pairs consumed per update cycle (default: 10)
# detection_window_turns = 10 # Turns after hard compaction to watch for context loss (default: 10)
# update_interval_secs = 300  # Interval in seconds between background updater checks (default: 300)
# max_stored_pairs = 100      # Maximum unused failure pairs retained before cleanup (default: 100)
# categorized_guidelines = false  # Maintain separate guideline documents per content category (default: false)

[memory.graph]
enabled = false                        # Enable graph memory (default: false, requires graph-memory feature)
extract_model = ""                     # LLM model for entity extraction; empty = agent's model
max_entities_per_message = 10          # Max entities extracted per message (default: 10)
max_edges_per_message = 15             # Max edges extracted per message (default: 15)
community_refresh_interval = 100       # Messages between community recalculation (default: 100)
entity_similarity_threshold = 0.85     # Cosine threshold for entity dedup (default: 0.85)
extraction_timeout_secs = 15           # Timeout for background extraction (default: 15)
use_embedding_resolution = false       # Use embedding-based entity resolution (default: false)
max_hops = 2                           # BFS traversal depth for graph recall (default: 2)
recall_limit = 10                      # Max graph facts injected into context (default: 10)
temporal_decay_rate = 0.0              # Recency boost for graph recall; 0.0 = disabled (default: 0.0)
                                       # Range: [0.0, 10.0]. Formula: 1/(1 + age_days * rate)
edge_history_limit = 100               # Max historical edge versions per source+predicate pair (default: 100)

[memory.graph.spreading_activation]
# enabled = false                     # Replace BFS with spreading activation (default: false)
# decay_lambda = 0.85                 # Per-hop decay factor, (0.0, 1.0] (default: 0.85)
# max_hops = 3                        # Maximum propagation depth (default: 3)
# activation_threshold = 0.1          # Minimum activation for inclusion (default: 0.1)
# inhibition_threshold = 0.8          # Lateral inhibition threshold (default: 0.8)
# max_activated_nodes = 50            # Cap on activated nodes (default: 50)

[tools]
enabled = true
summarize_output = false      # LLM-based summarization for long tool outputs
# max_tool_calls_per_session = 50  # Hard cap on tool executions per session; resets on /clear (default: unset = unlimited)

[tools.shell]
timeout = 30
blocked_commands = []
allowed_commands = []
allowed_paths = []          # Directories shell can access (empty = cwd only)
allow_network = true        # false blocks curl/wget/nc
confirm_patterns = ["rm ", "git push -f", "git push --force", "drop table", "drop database", "truncate ", "$(", "`", "<(", ">(", "<<<", "eval "]

[tools.file]
allowed_paths = []          # Directories file tools can access (empty = cwd only)

[tools.scrape]
timeout = 15
max_body_bytes = 1048576  # 1MB

[tools.filters]
enabled = true              # Enable smart output filtering for tool results

# [tools.filters.test]
# enabled = true
# max_failures = 10         # Truncate after N test failures
# truncate_stack_trace = 50 # Max stack trace lines per failure

# [tools.filters.git]
# enabled = true
# max_log_entries = 20      # Max git log entries
# max_diff_lines = 500      # Max diff lines

# [tools.filters.clippy]
# enabled = true

# [tools.filters.cargo_build]
# enabled = true

# [tools.filters.dir_listing]
# enabled = true

# [tools.filters.log_dedup]
# enabled = true

# [tools.filters.security]
# enabled = true
# extra_patterns = []       # Additional regex patterns to redact

# Per-tool permission rules (glob patterns with allow/ask/deny actions).
# Overrides legacy blocked_commands/confirm_patterns when set.
# [tools.permissions]
# shell = [
#   { pattern = "/tmp/*", action = "allow" },
#   { pattern = "/etc/*", action = "deny" },
#   { pattern = "*sudo*", action = "deny" },
#   { pattern = "cargo *", action = "allow" },
#   { pattern = "*", action = "ask" },
# ]

# Declarative policy compiler for tool call authorization (requires policy-enforcer feature).
# See docs/src/advanced/policy-enforcer.md for the full guide.
# [tools.policy]
# enabled = false           # Enable policy enforcement (default: false)
# default_effect = "deny"   # Fallback when no rule matches: "allow" or "deny" (default: "deny")
# policy_file = "policy.toml"  # Optional external rules file; overrides inline rules when set
#
# Inline rules (can also be loaded from policy_file):
# [[tools.policy.rules]]
# effect = "deny"           # "allow" or "deny"
# tool = "shell"            # Glob pattern for tool name (case-insensitive)
# paths = ["/etc/*", "/root/*"]  # Path globs matched against file_path param (CRIT-01: normalized)
# trust_level = "verified"  # Optional: rule only applies when context trust <= this level
# args_match = ".*sudo.*"   # Optional: regex matched against individual string param values
#
# [[tools.policy.rules]]
# effect = "allow"
# tool = "shell"
# paths = ["/tmp/*"]

# Supplementary OAP authorization layer (requires policy-enforcer feature).
# Rules are merged into PolicyEnforcer after [tools.policy.rules] (policy takes precedence).
# [tools.authorization]
# enabled = false            # Enable OAP authorization (default: false)
#
# [[tools.authorization.rules]]
# effect    = "deny"         # "allow" or "deny"
# tool      = "bash"         # Glob pattern for tool name
# args_match = ".*sudo.*"   # Optional: regex matched against string param values
#
# [[tools.authorization.rules]]
# effect = "allow"
# tool   = "read"
# paths  = ["/home/*"]

[tools.result_cache]
# enabled = true             # Enable tool result caching (default: true)
# ttl_secs = 300             # Cache entry lifetime in seconds, 0 = no expiry (default: 300)

[tools.tafc]
# enabled = false            # Enable TAFC schema augmentation (default: false)
# complexity_threshold = 0.6 # Complexity threshold for augmentation (default: 0.6)

[tools.dependencies]
# enabled = false            # Enable dependency gating (default: false)
# boost_per_dep = 0.15       # Boost per satisfied soft dependency (default: 0.15)
# max_total_boost = 0.2      # Maximum total soft boost (default: 0.2)
# [tools.dependencies.rules.deploy]
# requires = ["build", "test"]
# prefers = ["lint"]

[tools.overflow]
threshold = 50000           # Offload output larger than N chars to SQLite overflow table (default: 50000)
retention_days = 7          # Days to retain overflow entries before age-based cleanup (default: 7)

[tools.audit]
enabled = false             # Structured JSON audit log for tool executions
destination = "stdout"      # "stdout" or file path

# MagicDocs — auto-maintained markdown files with a "# MAGIC DOC:" header.
# [magic_docs]
# enabled                   = false
# min_turns_between_updates = 5    # turns between updates for the same file (default: 5)
# update_provider           = ""   # provider name; falls back to primary
# max_iterations            = 4    # max iterations per update LLM call (default: 4)

[security]
redact_secrets = true       # Redact API keys/tokens in LLM responses

[security.content_isolation]
enabled = true              # Master switch for untrusted content sanitizer
max_content_size = 65536    # Max bytes per source before truncation (default: 64 KiB)
flag_injection_patterns = true  # Detect and flag injection patterns
spotlight_untrusted = true  # Wrap untrusted content in XML delimiters

[security.content_isolation.quarantine]
enabled = false             # Opt-in: route high-risk sources through quarantine LLM
sources = ["web_scrape", "a2a_message"]  # Source kinds to quarantine
model = "claude"            # Provider/model for quarantine extraction

[security.exfiltration_guard]
block_markdown_images = true  # Strip external markdown images from LLM output
validate_tool_urls = true     # Flag tool calls using URLs from injection-flagged content
guard_memory_writes = true    # Skip Qdrant embedding for injection-flagged content

[timeouts]
llm_seconds = 120           # LLM chat completion timeout
embedding_seconds = 30      # Embedding generation timeout
a2a_seconds = 30            # A2A remote call timeout

[vault]
backend = "env"  # "env" (default) or "age"; CLI --vault overrides this

[observability]
exporter = "none"           # "none" or "otlp" (requires `otel` feature)
endpoint = "http://localhost:4317"

[cost]
enabled = false
max_daily_cents = 500       # Daily budget in cents (USD), UTC midnight reset

[a2a]
enabled = false
host = "0.0.0.0"
port = 8080
# public_url = "https://agent.example.com"
# auth_token = "secret"     # Bearer token for A2A server auth (from vault ZEPH_A2A_AUTH_TOKEN); warn logged at startup if unset
rate_limit = 60

[acp]
enabled = false                    # Auto-start ACP server on plain `zeph` startup using the configured transport (default: false)
max_sessions = 4                   # Max concurrent ACP sessions; LRU eviction when exceeded (default: 4)
session_idle_timeout_secs = 1800   # Idle session reaper timeout in seconds (default: 1800)
broadcast_capacity = 256           # Skill/config reload broadcast backlog shared by ACP sessions (default: 256)
# permission_file = "~/.config/zeph/acp-permissions.toml"  # Path to persisted permission decisions (default: ~/.config/zeph/acp-permissions.toml)
# auth_bearer_token = ""           # Bearer token for ACP HTTP/WS auth (env: ZEPH_ACP_AUTH_TOKEN, CLI: --acp-auth-token); omit for open mode (local use only)
discovery_enabled = true           # Expose GET /.well-known/acp.json manifest endpoint (env: ZEPH_ACP_DISCOVERY_ENABLED, default: true)

[acp.lsp]
enabled = true                     # Enable LSP extension when IDE advertises meta["lsp"] (default: true)
auto_diagnostics_on_save = true    # Fetch diagnostics on lsp/didSave notification (default: true)
max_diagnostics_per_file = 20      # Max diagnostics accepted per file (default: 20)
max_diagnostic_files = 5           # Max files in DiagnosticsCache, LRU eviction (default: 5)
max_references = 100               # Max reference locations returned (default: 100)
max_workspace_symbols = 50         # Max workspace symbol search results (default: 50)
request_timeout_secs = 10          # Timeout for LSP ext_method calls in seconds (default: 10)

[mcp]
allowed_commands = ["npx", "uvx", "node", "python", "python3"]
max_dynamic_servers = 10

# [[mcp.servers]]
# id = "filesystem"
# command = "npx"
# args = ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
# env = {}                  # Environment variables passed to the child process
# timeout = 30
# trust_level = "untrusted" # trusted, untrusted (default), or sandboxed
# tool_allowlist = []       # Tools to expose from this server; empty = all (untrusted) or none (sandboxed)

[agents]
enabled = false            # Enable sub-agent system (default: false)
max_concurrent = 1         # Max concurrent sub-agents (default: 1)
extra_dirs = []            # Additional directories to scan for agent definitions
# default_memory_scope = "project"  # Default memory scope for agents without explicit `memory` field
                                    # Valid: "user", "project", "local". Omit to disable.
# Lifecycle hooks — see Sub-Agent Orchestration > Hooks for details
# [agents.hooks]
# [[agents.hooks.start]]
# type = "command"
# command = "echo started"
# [[agents.hooks.stop]]
# type = "command"
# command = "./scripts/cleanup.sh"

[orchestration]
enabled = false                          # Enable task orchestration (default: false, requires `orchestration` feature)
max_tasks = 20                           # Max tasks per graph (default: 20)
max_parallel = 4                         # Max concurrent task executions (default: 4)
default_failure_strategy = "abort"       # abort, retry, skip, or ask (default: "abort")
default_max_retries = 3                  # Retries for the "retry" strategy (default: 3)
task_timeout_secs = 300                  # Per-task timeout in seconds, 0 = no timeout (default: 300)
# planner_provider = "quality"            # Provider name from [[llm.providers]] for planning LLM calls; empty = primary provider
planner_max_tokens = 4096                # Max tokens for planner LLM response (default: 4096; reserved — not yet enforced)
dependency_context_budget = 16384       # Character budget for cross-task context injection (default: 16384)
confirm_before_execute = true           # Show task summary and require /plan confirm before executing (default: true)
aggregator_max_tokens = 4096            # Token budget for the aggregation LLM call (default: 4096)
# topology_selection = false            # Enable topology classification and adaptive dispatch (default: false, requires experiments feature)
# verify_provider = ""                  # Provider name from [[llm.providers]] for post-task completeness verification; empty = primary provider

[orchestration.plan_cache]
# enabled = false                       # Enable plan template caching (default: false)
# similarity_threshold = 0.90           # Min cosine similarity for cache hit (default: 0.90)
# ttl_days = 30                         # Days since last access before eviction (default: 30)
# max_templates = 100                    # Maximum cached templates (default: 100)

[gateway]
enabled = false
bind = "127.0.0.1"
port = 8090
# auth_token = "secret"     # Bearer token for gateway auth (from vault ZEPH_GATEWAY_TOKEN); warn logged at startup if unset
rate_limit = 120
max_body_size = 1048576     # 1 MiB

[logging]
file = "/absolute/path/to/zeph.log"  # Optional override; omit to use the platform default in the user data dir (%LOCALAPPDATA%\Zeph\logs\zeph.log on Windows)
level = "info"                # File log level (default: "info"); does not affect stderr/RUST_LOG
rotation = "daily"            # Rotation strategy: daily, hourly, or never (default: "daily")
max_files = 7                 # Rotated log files to retain (default: 7)

[debug]
enabled = false             # Enable debug dump at startup (default: false)
output_dir = "/absolute/path/to/debug"  # Optional override; omit to use the platform default in the user data dir (%LOCALAPPDATA%\Zeph\debug on Windows)

# Requires `classifiers` feature.
# ML-backed injection detection and PII detection via Candle/DeBERTa models.
# When `enabled = false` (the default), the existing regex-based detection runs unchanged.
# [classifiers]
# enabled = false
# timeout_ms = 5000                                             # Per-inference timeout in ms (default: 5000)
# injection_model = "protectai/deberta-v3-small-prompt-injection-v2"  # HuggingFace repo ID
# injection_threshold = 0.8                                    # Minimum score to treat result as injection (default: 0.8)
# injection_model_sha256 = ""                                  # Optional SHA-256 hex for tamper detection
# pii_enabled = false                                          # Enable NER-based PII detection (default: false)
# pii_model = "iiiorg/piiranha-v1-detect-personal-information" # HuggingFace repo ID
# pii_threshold = 0.75                                         # Minimum per-token confidence for a PII label (default: 0.75)
# pii_model_sha256 = ""                                        # Optional SHA-256 hex for tamper detection

# Requires `experiments` feature.
# [experiments]
# enabled = false
# eval_model = "claude-sonnet-4-20250514"  # Model for LLM-as-judge (default: agent's model)
# benchmark_file = "benchmarks/eval.toml"  # Prompt set for A/B comparison
# max_experiments = 20                     # Max variations per session (default: 20)
# max_wall_time_secs = 3600               # Wall-clock budget per session (default: 3600)
# min_improvement = 0.5                   # Min score delta to accept (default: 0.5)
# eval_budget_tokens = 100000             # Token budget for judge calls (default: 100000)
# auto_apply = false                      # Write accepted variations to live config (default: false)
#
# [experiments.schedule]
# enabled = false                          # Cron-based automatic runs (default: false)
# cron = "0 3 * * *"                       # 5-field cron expression (default: daily 03:00)
# max_experiments_per_run = 20             # Cap per scheduled run (default: 20)
# max_wall_time_secs = 1800               # Wall-time cap per run (default: 1800)

Provider Entry Fields

Each [[llm.providers]] entry supports:

FieldTypeDescription
typestringProvider backend (ollama, claude, openai, gemini, candle, compatible)
namestring?Identifier for routing; required for compatible type
modelstring?Chat model
base_urlstring?API endpoint (Ollama / Compatible)
embedding_modelstring?Embedding model
embedboolMark as the embedding provider for skill matching and semantic memory
defaultboolMark as the primary chat provider
filenamestring?GGUF filename (Candle only)
devicestring?Compute device: cpu, metal, cuda (Candle only)

See Model Orchestrator for multi-provider routing examples and Complexity Triage Routing for pre-inference classification routing.

Environment Variables

VariableDescription
ZEPH_LLM_PROVIDERollama, claude, openai, candle, compatible, orchestrator, or router
ZEPH_LLM_BASE_URLOllama API endpoint
ZEPH_LLM_MODELModel name for Ollama
ZEPH_LLM_EMBEDDING_MODELEmbedding model for Ollama (default: qwen3-embedding)
ZEPH_LLM_VISION_MODELVision model for Ollama image requests (optional)
ZEPH_CLAUDE_API_KEYAnthropic API key (required for Claude)
ZEPH_OPENAI_API_KEYOpenAI API key (required for OpenAI provider)
ZEPH_GEMINI_API_KEYGoogle Gemini API key (required for Gemini provider)
ZEPH_TELEGRAM_TOKENTelegram bot token (enables Telegram mode)
ZEPH_SQLITE_PATHSQLite database path
ZEPH_QDRANT_URLQdrant server URL (default: http://localhost:6334)
ZEPH_MEMORY_SUMMARIZATION_THRESHOLDTrigger summarization after N messages (default: 100)
ZEPH_MEMORY_CONTEXT_BUDGET_TOKENSContext budget for proportional token allocation (default: 0 = unlimited)
ZEPH_MEMORY_SOFT_COMPACTION_THRESHOLDSoft compaction tier: prune tool outputs + apply deferred summaries (no LLM) when context usage exceeds this fraction (default: 0.60, must be < hard threshold)
ZEPH_MEMORY_HARD_COMPACTION_THRESHOLDHard compaction tier: full LLM summarization when context usage exceeds this fraction (default: 0.90). Also accepted as ZEPH_MEMORY_COMPACTION_THRESHOLD for backward compatibility.
ZEPH_MEMORY_COMPACTION_PRESERVE_TAILMessages preserved during compaction (default: 4)
ZEPH_MEMORY_PRUNE_PROTECT_TOKENSTokens protected from Tier 1 tool output pruning (default: 40000)
ZEPH_MEMORY_CROSS_SESSION_SCORE_THRESHOLDMinimum relevance score for cross-session memory (default: 0.35)
ZEPH_MEMORY_VECTOR_BACKENDVector backend: qdrant or sqlite (default: qdrant)
ZEPH_MEMORY_TOKEN_SAFETY_MARGINToken budget safety margin multiplier (default: 1.0)
ZEPH_MEMORY_REDACT_CREDENTIALSScrub credentials from LLM context (default: true)
ZEPH_MEMORY_AUTOSAVE_ASSISTANTPersist assistant responses to SQLite (default: false)
ZEPH_MEMORY_AUTOSAVE_MIN_LENGTHMin content length for assistant embedding (default: 20)
ZEPH_MEMORY_TOOL_CALL_CUTOFFMax visible tool pairs before oldest is summarized (default: 6)
ZEPH_LLM_RESPONSE_CACHE_ENABLEDEnable SQLite-backed LLM response cache (default: false)
ZEPH_LLM_RESPONSE_CACHE_TTL_SECSResponse cache TTL in seconds (default: 3600)
ZEPH_LLM_SEMANTIC_CACHE_ENABLEDEnable semantic similarity-based response caching (default: false)
ZEPH_LLM_SEMANTIC_CACHE_THRESHOLDCosine similarity threshold for semantic cache hit (default: 0.95)
ZEPH_LLM_SEMANTIC_CACHE_MAX_CANDIDATESMax entries examined per semantic cache lookup (default: 10)
ZEPH_MEMORY_SQLITE_POOL_SIZESQLite connection pool size (default: 5)
ZEPH_MEMORY_RESPONSE_CACHE_CLEANUP_INTERVAL_SECSInterval for purging expired LLM response cache entries in seconds (default: 3600)
ZEPH_MEMORY_SEMANTIC_ENABLEDEnable semantic memory (default: false)
ZEPH_MEMORY_RECALL_LIMITMax semantically relevant messages to recall (default: 5)
ZEPH_MEMORY_SEMANTIC_TEMPORAL_DECAY_ENABLEDEnable temporal decay scoring (default: false)
ZEPH_MEMORY_SEMANTIC_TEMPORAL_DECAY_HALF_LIFE_DAYSHalf-life for temporal decay in days (default: 30)
ZEPH_MEMORY_SEMANTIC_MMR_ENABLEDEnable MMR re-ranking (default: false)
ZEPH_MEMORY_SEMANTIC_MMR_LAMBDAMMR relevance-diversity trade-off (default: 0.7)
ZEPH_SKILLS_MAX_ACTIVEMax skills per query via embedding match (default: 5)
ZEPH_AGENT_MAX_TOOL_ITERATIONSMax tool loop iterations per response (default: 10)
ZEPH_TOOLS_SUMMARIZE_OUTPUTEnable LLM-based tool output summarization (default: false)
ZEPH_TOOLS_TIMEOUTShell command timeout in seconds (default: 30)
ZEPH_TOOLS_SCRAPE_TIMEOUTWeb scrape request timeout in seconds (default: 15)
ZEPH_TOOLS_SCRAPE_MAX_BODYMax response body size in bytes (default: 1048576)
ZEPH_ACP_MAX_SESSIONSMax concurrent ACP sessions (default: 4)
ZEPH_ACP_SESSION_IDLE_TIMEOUT_SECSIdle session reaper timeout in seconds (default: 1800)
ZEPH_ACP_PERMISSION_FILEPath to persisted ACP permission decisions (default: ~/.config/zeph/acp-permissions.toml)
ZEPH_ACP_AUTH_TOKENBearer token for ACP HTTP/WS authentication; omit for open mode (local use only)
ZEPH_ACP_DISCOVERY_ENABLEDExpose GET /.well-known/acp.json manifest endpoint (default: true)
ZEPH_A2A_ENABLEDEnable A2A server (default: false)
ZEPH_A2A_HOSTA2A server bind address (default: 0.0.0.0)
ZEPH_A2A_PORTA2A server port (default: 8080)
ZEPH_A2A_PUBLIC_URLPublic URL for agent card discovery
ZEPH_A2A_AUTH_TOKENBearer token for A2A server authentication
ZEPH_A2A_RATE_LIMITMax requests per IP per minute (default: 60)
ZEPH_A2A_REQUIRE_TLSRequire HTTPS for outbound A2A connections (default: true)
ZEPH_A2A_SSRF_PROTECTIONBlock private/loopback IPs in A2A client (default: true)
ZEPH_A2A_MAX_BODY_SIZEMax request body size in bytes (default: 1048576)
ZEPH_AGENTS_ENABLEDEnable sub-agent system (default: false)
ZEPH_AGENTS_MAX_CONCURRENTMax concurrent sub-agents (default: 1)
ZEPH_GATEWAY_ENABLEDEnable HTTP gateway (default: false)
ZEPH_GATEWAY_BINDGateway bind address (default: 127.0.0.1)
ZEPH_GATEWAY_PORTGateway HTTP port (default: 8090)
ZEPH_GATEWAY_TOKENBearer token for gateway authentication; warn logged at startup if unset
ZEPH_GATEWAY_RATE_LIMITMax requests per IP per minute (default: 120)
ZEPH_GATEWAY_MAX_BODY_SIZEMax request body size in bytes (default: 1048576)
ZEPH_TOOLS_FILE_ALLOWED_PATHSComma-separated directories file tools can access (empty = cwd)
ZEPH_TOOLS_SHELL_ALLOWED_PATHSComma-separated directories shell can access (empty = cwd)
ZEPH_TOOLS_SHELL_ALLOW_NETWORKAllow network commands from shell (default: true)
ZEPH_TOOLS_AUDIT_ENABLEDEnable audit logging for tool executions (default: false)
ZEPH_TOOLS_AUDIT_DESTINATIONAudit log destination: stdout or file path
ZEPH_SECURITY_REDACT_SECRETSRedact secrets in LLM responses (default: true)
ZEPH_TIMEOUT_LLMLLM call timeout in seconds (default: 120)
ZEPH_TIMEOUT_EMBEDDINGEmbedding generation timeout in seconds (default: 30)
ZEPH_TIMEOUT_A2AA2A remote call timeout in seconds (default: 30)
ZEPH_OBSERVABILITY_EXPORTERTracing exporter: none or otlp (default: none, requires otel feature)
ZEPH_OBSERVABILITY_ENDPOINTOTLP gRPC endpoint (default: http://localhost:4317)
ZEPH_COST_ENABLEDEnable cost tracking (default: false)
ZEPH_COST_MAX_DAILY_CENTSDaily spending limit in cents (default: 500)
ZEPH_STT_PROVIDERSTT provider: whisper or candle-whisper (default: whisper, requires stt feature)
ZEPH_STT_MODELSTT model name (default: whisper-1)
ZEPH_STT_BASE_URLSTT server base URL (e.g. http://127.0.0.1:8080/v1 for local whisper.cpp)
ZEPH_STT_LANGUAGESTT language: ISO-639-1 code or auto (default: auto)
ZEPH_LOG_FILEOverride logging.file (log file path; empty string disables file logging)
ZEPH_LOG_LEVELOverride logging.level (file log level, e.g. debug, warn)
ZEPH_CONFIGPath to config file (default: config/default.toml)
ZEPH_TUIEnable TUI dashboard: true or 1 (requires tui feature)
ZEPH_AUTO_UPDATE_CHECKEnable automatic update checks: true or false (default: true)