Configuration Reference
Complete reference for the Zeph configuration file and environment variables. For the interactive setup wizard, see Configuration Wizard.
Config File Resolution
Zeph loads config/default.toml at startup and applies environment variable overrides.
# CLI argument (highest priority)
zeph --config /path/to/custom.toml
# Environment variable
ZEPH_CONFIG=/path/to/custom.toml zeph
# Default (fallback)
# config/default.toml
Priority: --config > ZEPH_CONFIG > config/default.toml.
Validation
Config::validate() runs at startup and rejects out-of-range values:
| Field | Constraint |
|---|---|
memory.history_limit | <= 10,000 |
memory.context_budget_tokens | <= 1,000,000 (when > 0) |
memory.soft_compaction_threshold | 0.0–1.0, must be < hard_compaction_threshold |
memory.hard_compaction_threshold | 0.0–1.0, must be > soft_compaction_threshold |
memory.graph.temporal_decay_rate | finite, in [0.0, 10.0]; NaN and Inf rejected at deserialization |
memory.compression.threshold_tokens | >= 1,000 (proactive only) |
memory.compression.max_summary_tokens | >= 128 (proactive only) |
memory.compression.probe.threshold | (0.0, 1.0], must be > hard_fail_threshold |
memory.compression.probe.hard_fail_threshold | [0.0, 1.0), must be < threshold |
memory.compression.probe.max_questions | >= 1 |
memory.compression.probe.timeout_secs | >= 1 |
memory.semantic.importance_weight | finite, in [0.0, 1.0] |
memory.graph.spreading_activation.decay_lambda | in (0.0, 1.0] |
memory.graph.spreading_activation.max_hops | >= 1 |
memory.graph.spreading_activation.activation_threshold | < inhibition_threshold |
memory.graph.spreading_activation.inhibition_threshold | > activation_threshold |
memory.graph.spreading_activation.seed_structural_weight | in [0.0, 1.0] |
memory.graph.note_linking.link_weight_decay_lambda | finite, in (0.0, 1.0] |
llm.semantic_cache_threshold | finite, in [0.0, 1.0] |
orchestration.plan_cache.similarity_threshold | in [0.5, 1.0] |
orchestration.plan_cache.max_templates | in [1, 10000] |
orchestration.plan_cache.ttl_days | in [1, 365] |
memory.token_safety_margin | > 0.0 |
agent.max_tool_iterations | <= 100 |
a2a.rate_limit | > 0 |
acp.max_sessions | > 0 |
acp.session_idle_timeout_secs | > 0 |
acp.permission_file | valid file path (optional) |
acp.lsp.request_timeout_secs | > 0 |
gateway.rate_limit | > 0 |
gateway.max_body_size | <= 10,485,760 (10 MiB) |
Hot-Reload
Zeph watches the config file for changes and applies runtime-safe fields without restart (500ms debounce).
Reloadable fields:
| Section | Fields |
|---|---|
[security] | redact_secrets |
[timeouts] | llm_seconds, embedding_seconds, a2a_seconds |
[memory] | history_limit, summarization_threshold, context_budget_tokens, soft_compaction_threshold, hard_compaction_threshold, compaction_preserve_tail, prune_protect_tokens, cross_session_score_threshold |
[memory.semantic] | recall_limit |
[index] | repo_map_ttl_secs, watch |
[agent] | max_tool_iterations |
[skills] | max_active_skills |
Not reloadable (require restart): LLM provider/model, SQLite path, Qdrant URL, vector backend, Telegram token, MCP servers, A2A config, ACP config (including [acp.lsp]), agents config, skill paths, LSP context injection config ([agent.lsp]), compaction probe config ([memory.compression.probe]).
Breaking change (v0.17.0): The old
[llm.cloud],[llm.orchestrator], and[llm.router]config sections have been removed. Runzeph --migrate-configto automatically convert your config file.
Configuration File
[agent]
name = "Zeph"
max_tool_iterations = 10 # Max tool loop iterations per response (default: 10)
auto_update_check = true # Query GitHub Releases API for newer versions (default: true)
[agent.instructions]
auto_detect = true # Auto-detect provider-specific files: CLAUDE.md, AGENTS.md, GEMINI.md (default: true)
extra_files = [] # Additional instruction files (absolute or relative to cwd)
max_size_bytes = 262144 # Per-file size cap in bytes (default: 256 KiB)
# zeph.md and .zeph/zeph.md are always loaded regardless of auto_detect.
# Use --instruction-file <path> at the CLI to supply extra files at startup.
# LSP context injection — requires lsp-context feature and mcpls MCP server.
# Enable with --lsp-context CLI flag or by setting enabled = true here.
# [agent.lsp]
# enabled = false # Enable LSP context injection hooks (default: false)
# mcp_server_id = "mcpls" # MCP server ID providing LSP tools (default: "mcpls")
# token_budget = 2000 # Max tokens to spend on injected LSP context per turn (default: 2000)
#
# [agent.lsp.diagnostics]
# enabled = true # Inject diagnostics after write_file (default: true when agent.lsp is enabled)
# max_per_file = 20 # Max diagnostics per file (default: 20)
# max_files = 5 # Max files per injection batch (default: 5)
# min_severity = "error" # Minimum severity: "error", "warning", "info", or "hint" (default: "error")
#
# [agent.lsp.hover]
# enabled = false # Pre-fetch hover info after read_file (default: false)
# max_symbols = 10 # Max symbols to fetch hover for per file (default: 10)
#
# [agent.lsp.references]
# enabled = true # Inject reference list before rename_symbol (default: true)
# max_refs = 50 # Max references to show per symbol (default: 50)
[agent.learning]
correction_detection = true # Enable implicit correction detection (default: true)
correction_confidence_threshold = 0.7 # Jaccard token overlap threshold for correction candidates (default: 0.7)
correction_recall_limit = 3 # Max corrections injected into system prompt (default: 3)
correction_min_similarity = 0.75 # Min cosine similarity for correction recall from Qdrant (default: 0.75)
[llm]
# routing = "none" # none (default), ema, thompson, cascade, task, triage
# router_ema_enabled = false # EMA-based provider latency routing (default: false)
# router_ema_alpha = 0.1 # EMA smoothing factor, 0.0–1.0 (default: 0.1)
# router_reorder_interval = 10 # Re-order providers every N requests (default: 10)
# thompson_state_path = "~/.zeph/router_thompson_state.json" # Thompson state persistence path
# response_cache_enabled = false # SQLite-backed LLM response cache (default: false)
# response_cache_ttl_secs = 3600 # Cache TTL in seconds (default: 3600)
# semantic_cache_enabled = false # Embedding-based similarity cache (default: false)
# semantic_cache_threshold = 0.95 # Cosine similarity for cache hit (default: 0.95)
# semantic_cache_max_candidates = 10 # Max entries to examine per lookup (default: 10)
# Dedicated provider for tool-pair summarization and context compaction (optional).
# String shorthand — pick one format, or use [llm.summary_provider] below.
# summary_model = "ollama/qwen3:1.7b" # ollama/<model>
# summary_model = "claude" # Claude, model from the claude provider entry
# summary_model = "claude/claude-haiku-4-5-20251001"
# summary_model = "openai/gpt-4o-mini"
# summary_model = "compatible/<name>" # [[llm.providers]] entry name for compatible type
# summary_model = "candle"
# Structured summary provider. Takes precedence over summary_model when both are set.
# [llm.summary_provider]
# type = "claude" # claude, openai, compatible, ollama, candle
# model = "claude-haiku-4-5-20251001" # model override
# base_url = "..." # endpoint override (ollama / openai only)
# embedding_model = "..." # embedding model override (ollama / openai only)
# device = "cpu" # cpu, cuda, metal (candle only)
# Cascade routing options (when routing = "cascade").
# [llm.cascade]
# quality_threshold = 0.5 # Score below which response is degenerate (default: 0.5)
# max_escalations = 2 # Max escalation steps per request (default: 2)
# classifier_mode = "heuristic" # "heuristic" (default) or "judge" (LLM-backed)
# max_cascade_tokens = 0 # Cumulative token cap across escalation levels; 0 = unlimited
# cost_tiers = ["ollama", "claude"] # Explicit cost ordering (cheapest first)
# Quality gate for Thompson/EMA routing — post-selection embedding similarity check.
# quality_gate = 0.0 # Cosine threshold; 0.0 = disabled (default: 0.0). Applies to thompson/ema only.
# ASI coherence tracking — penalizes providers with low response coherence.
# [llm.routing.asi]
# enabled = false
# window_size = 10 # Sliding window of response embeddings per provider (default: 10)
# coherence_threshold = 0.5 # Warn when rolling mean drops below this (default: 0.5)
# penalty_weight = 0.3 # Multiplier applied to Thompson/EMA scores (default: 0.3)
# embedding_provider = "" # Provider name for response embeddings; empty = primary
# Complexity triage routing options (when routing = "triage").
# [llm.complexity_routing]
# triage_provider = "fast" # Provider name used for classification (required)
# bypass_single_provider = true # Skip triage when all tiers map to the same provider (default: true)
# triage_timeout_secs = 5 # Triage call timeout; falls back to simple tier on expiry (default: 5)
# max_triage_tokens = 50 # Max tokens in triage response (default: 50)
# fallback_strategy = "cascade" # Optional hybrid mode: triage + quality escalation ("cascade" only)
#
# [llm.complexity_routing.tiers]
# simple = "fast" # Provider name for trivial requests; also used as triage fallback
# medium = "default" # Provider name for moderate requests
# complex = "smart" # Provider name for multi-step / code-heavy requests
# expert = "expert" # Provider name for research-grade requests
# Provider list — each [[llm.providers]] entry defines one LLM backend.
[[llm.providers]]
type = "ollama" # ollama, claude, openai, gemini, candle, compatible
# name = "local" # optional: identifier for multi-provider routing; required for compatible
base_url = "http://localhost:11434"
model = "qwen3:8b"
embedding_model = "qwen3-embedding" # model for text embeddings
# vision_model = "llava:13b" # Ollama only: dedicated model for image requests
# embed = true # mark as embedding provider for skill matching and semantic memory
# default = true # mark as primary chat provider
# embed_concurrency = 4 # Max concurrent embedding requests via semaphore (default: 4, 0 = unlimited)
# Additional provider examples:
# [[llm.providers]]
# name = "cloud"
# type = "claude"
# model = "claude-sonnet-4-6"
# max_tokens = 4096
# server_compaction = false # Enable Claude server-side context compaction (compact-2026-01-12 beta)
# enable_extended_context = false # Enable Claude 1M context window (context-1m-2025-08-07 beta, Sonnet/Opus 4.6)
# default = true
# [[llm.providers]]
# type = "openai"
# base_url = "https://api.openai.com/v1"
# model = "gpt-5.2"
# max_tokens = 4096
# embedding_model = "text-embedding-3-small"
# reasoning_effort = "medium" # low, medium, high (for reasoning models)
# [[llm.providers]]
# type = "gemini"
# model = "gemini-2.0-flash"
# max_tokens = 8192
# embedding_model = "text-embedding-004" # enable Gemini embeddings (optional)
# thinking_level = "medium" # minimal, low, medium, high (Gemini 2.5+ only)
# thinking_budget = 8192 # token budget; -1 = dynamic, 0 = disabled (Gemini 2.5+ only)
# include_thoughts = true # surface thinking chunks in TUI
# base_url = "https://generativelanguage.googleapis.com/v1beta"
# [[llm.providers]]
# name = "groq"
# type = "compatible"
# base_url = "https://api.groq.com/openai/v1"
# model = "llama-3.3-70b-versatile"
# max_tokens = 4096
[llm.stt]
provider = "whisper"
model = "whisper-1"
# base_url = "http://127.0.0.1:8080/v1" # optional: OpenAI-compatible server
# language = "en" # optional: ISO-639-1 code or "auto"
# Requires `stt` feature. When base_url is set, targets a local server (no API key needed).
# When omitted, uses the OpenAI API key from the openai [[llm.providers]] entry or ZEPH_OPENAI_API_KEY.
[skills]
# Defaults to the user config dir when omitted
# (for example ~/.config/zeph/skills on Linux,
# ~/Library/Application Support/Zeph/skills on macOS,
# %APPDATA%\zeph\skills on Windows).
# paths = ["/absolute/path/to/skills"]
max_active_skills = 5 # Top-K skills per query via embedding similarity
disambiguation_threshold = 0.05 # LLM disambiguation when top-2 score delta < threshold (0.0 = disabled)
prompt_mode = "auto" # Skill prompt format: "full", "compact", or "auto" (default: "auto")
cosine_weight = 0.7 # Cosine signal weight in BM25+cosine fusion (default: 0.7)
hybrid_search = false # Enable BM25+cosine hybrid skill matching (default: false)
[skills.learning]
enabled = true # Enable self-learning skill improvement (default: true)
auto_activate = false # Require manual approval for new versions (default: false)
min_failures = 3 # Failures before triggering improvement (default: 3)
improve_threshold = 0.7 # Success rate below which improvement starts (default: 0.7)
rollback_threshold = 0.5 # Auto-rollback when success rate drops below this (default: 0.5)
min_evaluations = 5 # Minimum evaluations before rollback decision (default: 5)
max_versions = 10 # Max auto-generated versions per skill (default: 10)
cooldown_minutes = 60 # Cooldown between improvements for same skill (default: 60)
detector_mode = "regex" # Correction detector: "regex" (default) or "judge" (LLM-backed)
judge_model = "" # Model for judge calls; empty = use primary provider
judge_adaptive_low = 0.5 # Regex confidence below this bypasses judge (default: 0.5)
judge_adaptive_high = 0.8 # Regex confidence at/above this bypasses judge (default: 0.8)
[memory]
# Defaults to the user data dir when omitted
# (for example ~/.local/share/zeph/data/zeph.db on Linux,
# ~/Library/Application Support/Zeph/data/zeph.db on macOS,
# %LOCALAPPDATA%\Zeph\data\zeph.db on Windows).
# sqlite_path = "/absolute/path/to/zeph.db"
history_limit = 50
summarization_threshold = 100 # Trigger summarization after N messages
context_budget_tokens = 0 # 0 = unlimited (proportional split: 15% summaries, 25% recall, 60% recent)
soft_compaction_threshold = 0.60 # Soft tier: prune tool outputs + apply deferred summaries (no LLM); default: 0.60
hard_compaction_threshold = 0.90 # Hard tier: full LLM summarization when usage exceeds this fraction; default: 0.90
compaction_preserve_tail = 4 # Keep last N messages during compaction
prune_protect_tokens = 40000 # Protect recent N tokens from tool output pruning
cross_session_score_threshold = 0.35 # Minimum relevance for cross-session results
vector_backend = "qdrant" # Vector store: "qdrant" (default) or "sqlite" (embedded)
sqlite_pool_size = 5 # SQLite connection pool size (default: 5)
response_cache_cleanup_interval_secs = 3600 # Interval for purging expired LLM response cache entries (default: 3600)
token_safety_margin = 1.0 # Multiplier for token budget safety margin (default: 1.0)
redact_credentials = true # Scrub credential patterns from LLM context (default: true)
autosave_assistant = false # Persist assistant responses to SQLite and embed (default: false)
autosave_min_length = 20 # Min content length for assistant embedding (default: 20)
tool_call_cutoff = 6 # Summarize oldest tool pair when visible pairs exceed this (default: 6)
# key_facts_dedup_threshold = 0.95 # Cosine similarity threshold for near-duplicate key_facts suppression (default: 0.95)
# Persona memory — extract and inject stable user preference and domain facts.
# [memory.persona]
# enabled = false
# persona_provider = "fast" # cheap extraction model; falls back to primary
# min_confidence = 0.6 # facts below this are discarded (default: 0.6)
# min_messages = 3 # minimum user messages before first extraction (default: 3)
# max_messages = 10 # messages fed to LLM per extraction pass (default: 10)
# extraction_timeout_secs = 10 # timeout for extraction LLM call (default: 10)
# context_budget_tokens = 500 # token budget for injected persona facts (default: 500)
# Trajectory memory — extract procedural/episodic entries from tool-call turns.
# [memory.trajectory]
# enabled = false
# trajectory_provider = "fast" # cheap extraction model; falls back to primary
# context_budget_tokens = 400 # token budget for injected trajectory hints (default: 400)
# recall_top_k = 5 # procedural entries retrieved per turn (default: 5)
# min_confidence = 0.6 # entries below this are discarded (default: 0.6)
# max_messages = 10 # messages fed to LLM per extraction pass (default: 10)
# extraction_timeout_secs = 10 # timeout for extraction LLM call (default: 10)
# Category-aware memory — tag messages with a category from active skill/tool context.
# [memory.category]
# enabled = false
# auto_tag = true # derive category from active skill or tool type automatically (default: true)
# TiMem temporal-hierarchical memory tree — hierarchical summary consolidation.
# [memory.tree]
# enabled = false
# consolidation_provider = "fast" # falls back to primary
# sweep_interval_secs = 300 # background consolidation interval (default: 300)
# batch_size = 20 # leaves processed per sweep (default: 20)
# similarity_threshold = 0.8 # cosine threshold for clustering (default: 0.8)
# max_level = 3 # maximum tree depth above leaves (default: 3)
# context_budget_tokens = 400 # token budget for tree traversal in context (default: 400)
# recall_top_k = 5 # nodes retrieved per turn (default: 5)
# min_cluster_size = 2 # minimum cluster size to trigger LLM consolidation (default: 2)
# Time-based microcompact — clear stale low-value tool outputs after an idle gap.
# [memory.microcompact]
# enabled = false
# gap_threshold_minutes = 60 # idle gap in minutes before clearing stale outputs (default: 60)
# keep_recent = 3 # most recent low-value tool outputs to preserve (default: 3)
# autoDream — background memory consolidation after session-count and time gates pass.
# [memory.autodream]
# enabled = false
# min_sessions = 3 # sessions since last consolidation (default: 3)
# min_hours = 24 # hours since last consolidation (default: 24)
# consolidation_provider = "" # provider name; falls back to primary
# max_iterations = 8 # safety bound for consolidation sweep (default: 8)
[memory.semantic]
enabled = false # Enable semantic search via Qdrant
recall_limit = 5 # Number of semantically relevant messages to inject
temporal_decay_enabled = false # Attenuate scores by message age (default: false)
temporal_decay_half_life_days = 30 # Half-life for temporal decay in days (default: 30)
mmr_enabled = false # MMR re-ranking for result diversity (default: false)
mmr_lambda = 0.7 # MMR relevance-diversity trade-off, 0.0-1.0 (default: 0.7)
importance_enabled = false # Write-time importance scoring for recall boost (default: false)
importance_weight = 0.15 # Blend weight for importance in ranking, [0.0, 1.0] (default: 0.15)
[memory.routing]
strategy = "heuristic" # Routing strategy for memory backend selection (default: "heuristic")
# [memory.admission]
# enabled = false # Enable A-MAC adaptive memory admission control (default: false)
# threshold = 0.40 # Composite score threshold; messages below this are rejected (default: 0.40)
# fast_path_margin = 0.15 # Admit immediately when score >= threshold + margin (default: 0.15)
# admission_provider = "fast" # Provider for LLM-assisted admission decisions (optional, default: "")
# admission_strategy = "heuristic" # "heuristic" (default) or "rl" (preview — falls back to heuristic)
# rl_min_samples = 500 # Training samples required before RL model activates (default: 500)
# rl_retrain_interval_secs = 3600 # Background RL retraining interval in seconds (default: 3600)
#
# [memory.admission.weights]
# future_utility = 0.30 # LLM-estimated future reuse probability (heuristic mode only)
# factual_confidence = 0.15 # Inverse of hedging markers
# semantic_novelty = 0.30 # 1 - max similarity to existing memories
# temporal_recency = 0.10 # Always 1.0 at write time
# content_type_prior = 0.15 # Role-based prior
[memory.compression]
strategy = "reactive" # "reactive" (default) or "proactive"
# Proactive strategy fields (required when strategy = "proactive"):
# threshold_tokens = 80000 # Fire compression when context exceeds this token count (>= 1000)
# max_summary_tokens = 4000 # Cap for the compressed summary (>= 128)
# model = "" # Reserved — currently unused
# archive_tool_outputs = false # Archive tool output bodies to SQLite before compaction (default: false)
[memory.compression.probe]
# enabled = false # Enable compaction probe validation (default: false)
# model = "" # Model for probe LLM calls; empty = summary provider (default: "")
# threshold = 0.6 # Minimum score for Pass verdict (default: 0.6)
# hard_fail_threshold = 0.35 # Score below this blocks compaction (default: 0.35)
# max_questions = 3 # Factual questions per probe (default: 3)
# timeout_secs = 15 # Timeout for both LLM calls in seconds (default: 15)
[memory.compression_guidelines]
enabled = false # Enable failure-driven compression guidelines (default: false)
# update_threshold = 5 # Minimum unused failure pairs before triggering a guidelines update (default: 5)
# max_guidelines_tokens = 500 # Token budget for the guidelines document (default: 500)
# max_pairs_per_update = 10 # Failure pairs consumed per update cycle (default: 10)
# detection_window_turns = 10 # Turns after hard compaction to watch for context loss (default: 10)
# update_interval_secs = 300 # Interval in seconds between background updater checks (default: 300)
# max_stored_pairs = 100 # Maximum unused failure pairs retained before cleanup (default: 100)
# categorized_guidelines = false # Maintain separate guideline documents per content category (default: false)
[memory.graph]
enabled = false # Enable graph memory (default: false, requires graph-memory feature)
extract_model = "" # LLM model for entity extraction; empty = agent's model
max_entities_per_message = 10 # Max entities extracted per message (default: 10)
max_edges_per_message = 15 # Max edges extracted per message (default: 15)
community_refresh_interval = 100 # Messages between community recalculation (default: 100)
entity_similarity_threshold = 0.85 # Cosine threshold for entity dedup (default: 0.85)
extraction_timeout_secs = 15 # Timeout for background extraction (default: 15)
use_embedding_resolution = false # Use embedding-based entity resolution (default: false)
max_hops = 2 # BFS traversal depth for graph recall (default: 2)
recall_limit = 10 # Max graph facts injected into context (default: 10)
temporal_decay_rate = 0.0 # Recency boost for graph recall; 0.0 = disabled (default: 0.0)
# Range: [0.0, 10.0]. Formula: 1/(1 + age_days * rate)
edge_history_limit = 100 # Max historical edge versions per source+predicate pair (default: 100)
[memory.graph.spreading_activation]
# enabled = false # Replace BFS with spreading activation (default: false)
# decay_lambda = 0.85 # Per-hop decay factor, (0.0, 1.0] (default: 0.85)
# max_hops = 3 # Maximum propagation depth (default: 3)
# activation_threshold = 0.1 # Minimum activation for inclusion (default: 0.1)
# inhibition_threshold = 0.8 # Lateral inhibition threshold (default: 0.8)
# max_activated_nodes = 50 # Cap on activated nodes (default: 50)
[tools]
enabled = true
summarize_output = false # LLM-based summarization for long tool outputs
# max_tool_calls_per_session = 50 # Hard cap on tool executions per session; resets on /clear (default: unset = unlimited)
[tools.shell]
timeout = 30
blocked_commands = []
allowed_commands = []
allowed_paths = [] # Directories shell can access (empty = cwd only)
allow_network = true # false blocks curl/wget/nc
confirm_patterns = ["rm ", "git push -f", "git push --force", "drop table", "drop database", "truncate ", "$(", "`", "<(", ">(", "<<<", "eval "]
[tools.file]
allowed_paths = [] # Directories file tools can access (empty = cwd only)
[tools.scrape]
timeout = 15
max_body_bytes = 1048576 # 1MB
[tools.filters]
enabled = true # Enable smart output filtering for tool results
# [tools.filters.test]
# enabled = true
# max_failures = 10 # Truncate after N test failures
# truncate_stack_trace = 50 # Max stack trace lines per failure
# [tools.filters.git]
# enabled = true
# max_log_entries = 20 # Max git log entries
# max_diff_lines = 500 # Max diff lines
# [tools.filters.clippy]
# enabled = true
# [tools.filters.cargo_build]
# enabled = true
# [tools.filters.dir_listing]
# enabled = true
# [tools.filters.log_dedup]
# enabled = true
# [tools.filters.security]
# enabled = true
# extra_patterns = [] # Additional regex patterns to redact
# Per-tool permission rules (glob patterns with allow/ask/deny actions).
# Overrides legacy blocked_commands/confirm_patterns when set.
# [tools.permissions]
# shell = [
# { pattern = "/tmp/*", action = "allow" },
# { pattern = "/etc/*", action = "deny" },
# { pattern = "*sudo*", action = "deny" },
# { pattern = "cargo *", action = "allow" },
# { pattern = "*", action = "ask" },
# ]
# Declarative policy compiler for tool call authorization (requires policy-enforcer feature).
# See docs/src/advanced/policy-enforcer.md for the full guide.
# [tools.policy]
# enabled = false # Enable policy enforcement (default: false)
# default_effect = "deny" # Fallback when no rule matches: "allow" or "deny" (default: "deny")
# policy_file = "policy.toml" # Optional external rules file; overrides inline rules when set
#
# Inline rules (can also be loaded from policy_file):
# [[tools.policy.rules]]
# effect = "deny" # "allow" or "deny"
# tool = "shell" # Glob pattern for tool name (case-insensitive)
# paths = ["/etc/*", "/root/*"] # Path globs matched against file_path param (CRIT-01: normalized)
# trust_level = "verified" # Optional: rule only applies when context trust <= this level
# args_match = ".*sudo.*" # Optional: regex matched against individual string param values
#
# [[tools.policy.rules]]
# effect = "allow"
# tool = "shell"
# paths = ["/tmp/*"]
# Supplementary OAP authorization layer (requires policy-enforcer feature).
# Rules are merged into PolicyEnforcer after [tools.policy.rules] (policy takes precedence).
# [tools.authorization]
# enabled = false # Enable OAP authorization (default: false)
#
# [[tools.authorization.rules]]
# effect = "deny" # "allow" or "deny"
# tool = "bash" # Glob pattern for tool name
# args_match = ".*sudo.*" # Optional: regex matched against string param values
#
# [[tools.authorization.rules]]
# effect = "allow"
# tool = "read"
# paths = ["/home/*"]
[tools.result_cache]
# enabled = true # Enable tool result caching (default: true)
# ttl_secs = 300 # Cache entry lifetime in seconds, 0 = no expiry (default: 300)
[tools.tafc]
# enabled = false # Enable TAFC schema augmentation (default: false)
# complexity_threshold = 0.6 # Complexity threshold for augmentation (default: 0.6)
[tools.dependencies]
# enabled = false # Enable dependency gating (default: false)
# boost_per_dep = 0.15 # Boost per satisfied soft dependency (default: 0.15)
# max_total_boost = 0.2 # Maximum total soft boost (default: 0.2)
# [tools.dependencies.rules.deploy]
# requires = ["build", "test"]
# prefers = ["lint"]
[tools.overflow]
threshold = 50000 # Offload output larger than N chars to SQLite overflow table (default: 50000)
retention_days = 7 # Days to retain overflow entries before age-based cleanup (default: 7)
[tools.audit]
enabled = false # Structured JSON audit log for tool executions
destination = "stdout" # "stdout" or file path
# MagicDocs — auto-maintained markdown files with a "# MAGIC DOC:" header.
# [magic_docs]
# enabled = false
# min_turns_between_updates = 5 # turns between updates for the same file (default: 5)
# update_provider = "" # provider name; falls back to primary
# max_iterations = 4 # max iterations per update LLM call (default: 4)
[security]
redact_secrets = true # Redact API keys/tokens in LLM responses
[security.content_isolation]
enabled = true # Master switch for untrusted content sanitizer
max_content_size = 65536 # Max bytes per source before truncation (default: 64 KiB)
flag_injection_patterns = true # Detect and flag injection patterns
spotlight_untrusted = true # Wrap untrusted content in XML delimiters
[security.content_isolation.quarantine]
enabled = false # Opt-in: route high-risk sources through quarantine LLM
sources = ["web_scrape", "a2a_message"] # Source kinds to quarantine
model = "claude" # Provider/model for quarantine extraction
[security.exfiltration_guard]
block_markdown_images = true # Strip external markdown images from LLM output
validate_tool_urls = true # Flag tool calls using URLs from injection-flagged content
guard_memory_writes = true # Skip Qdrant embedding for injection-flagged content
[timeouts]
llm_seconds = 120 # LLM chat completion timeout
embedding_seconds = 30 # Embedding generation timeout
a2a_seconds = 30 # A2A remote call timeout
[vault]
backend = "env" # "env" (default) or "age"; CLI --vault overrides this
[observability]
exporter = "none" # "none" or "otlp" (requires `otel` feature)
endpoint = "http://localhost:4317"
[cost]
enabled = false
max_daily_cents = 500 # Daily budget in cents (USD), UTC midnight reset
[a2a]
enabled = false
host = "0.0.0.0"
port = 8080
# public_url = "https://agent.example.com"
# auth_token = "secret" # Bearer token for A2A server auth (from vault ZEPH_A2A_AUTH_TOKEN); warn logged at startup if unset
rate_limit = 60
[acp]
enabled = false # Auto-start ACP server on plain `zeph` startup using the configured transport (default: false)
max_sessions = 4 # Max concurrent ACP sessions; LRU eviction when exceeded (default: 4)
session_idle_timeout_secs = 1800 # Idle session reaper timeout in seconds (default: 1800)
broadcast_capacity = 256 # Skill/config reload broadcast backlog shared by ACP sessions (default: 256)
# permission_file = "~/.config/zeph/acp-permissions.toml" # Path to persisted permission decisions (default: ~/.config/zeph/acp-permissions.toml)
# auth_bearer_token = "" # Bearer token for ACP HTTP/WS auth (env: ZEPH_ACP_AUTH_TOKEN, CLI: --acp-auth-token); omit for open mode (local use only)
discovery_enabled = true # Expose GET /.well-known/acp.json manifest endpoint (env: ZEPH_ACP_DISCOVERY_ENABLED, default: true)
[acp.lsp]
enabled = true # Enable LSP extension when IDE advertises meta["lsp"] (default: true)
auto_diagnostics_on_save = true # Fetch diagnostics on lsp/didSave notification (default: true)
max_diagnostics_per_file = 20 # Max diagnostics accepted per file (default: 20)
max_diagnostic_files = 5 # Max files in DiagnosticsCache, LRU eviction (default: 5)
max_references = 100 # Max reference locations returned (default: 100)
max_workspace_symbols = 50 # Max workspace symbol search results (default: 50)
request_timeout_secs = 10 # Timeout for LSP ext_method calls in seconds (default: 10)
[mcp]
allowed_commands = ["npx", "uvx", "node", "python", "python3"]
max_dynamic_servers = 10
# [[mcp.servers]]
# id = "filesystem"
# command = "npx"
# args = ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
# env = {} # Environment variables passed to the child process
# timeout = 30
# trust_level = "untrusted" # trusted, untrusted (default), or sandboxed
# tool_allowlist = [] # Tools to expose from this server; empty = all (untrusted) or none (sandboxed)
[agents]
enabled = false # Enable sub-agent system (default: false)
max_concurrent = 1 # Max concurrent sub-agents (default: 1)
extra_dirs = [] # Additional directories to scan for agent definitions
# default_memory_scope = "project" # Default memory scope for agents without explicit `memory` field
# Valid: "user", "project", "local". Omit to disable.
# Lifecycle hooks — see Sub-Agent Orchestration > Hooks for details
# [agents.hooks]
# [[agents.hooks.start]]
# type = "command"
# command = "echo started"
# [[agents.hooks.stop]]
# type = "command"
# command = "./scripts/cleanup.sh"
[orchestration]
enabled = false # Enable task orchestration (default: false, requires `orchestration` feature)
max_tasks = 20 # Max tasks per graph (default: 20)
max_parallel = 4 # Max concurrent task executions (default: 4)
default_failure_strategy = "abort" # abort, retry, skip, or ask (default: "abort")
default_max_retries = 3 # Retries for the "retry" strategy (default: 3)
task_timeout_secs = 300 # Per-task timeout in seconds, 0 = no timeout (default: 300)
# planner_provider = "quality" # Provider name from [[llm.providers]] for planning LLM calls; empty = primary provider
planner_max_tokens = 4096 # Max tokens for planner LLM response (default: 4096; reserved — not yet enforced)
dependency_context_budget = 16384 # Character budget for cross-task context injection (default: 16384)
confirm_before_execute = true # Show task summary and require /plan confirm before executing (default: true)
aggregator_max_tokens = 4096 # Token budget for the aggregation LLM call (default: 4096)
# topology_selection = false # Enable topology classification and adaptive dispatch (default: false, requires experiments feature)
# verify_provider = "" # Provider name from [[llm.providers]] for post-task completeness verification; empty = primary provider
[orchestration.plan_cache]
# enabled = false # Enable plan template caching (default: false)
# similarity_threshold = 0.90 # Min cosine similarity for cache hit (default: 0.90)
# ttl_days = 30 # Days since last access before eviction (default: 30)
# max_templates = 100 # Maximum cached templates (default: 100)
[gateway]
enabled = false
bind = "127.0.0.1"
port = 8090
# auth_token = "secret" # Bearer token for gateway auth (from vault ZEPH_GATEWAY_TOKEN); warn logged at startup if unset
rate_limit = 120
max_body_size = 1048576 # 1 MiB
[logging]
file = "/absolute/path/to/zeph.log" # Optional override; omit to use the platform default in the user data dir (%LOCALAPPDATA%\Zeph\logs\zeph.log on Windows)
level = "info" # File log level (default: "info"); does not affect stderr/RUST_LOG
rotation = "daily" # Rotation strategy: daily, hourly, or never (default: "daily")
max_files = 7 # Rotated log files to retain (default: 7)
[debug]
enabled = false # Enable debug dump at startup (default: false)
output_dir = "/absolute/path/to/debug" # Optional override; omit to use the platform default in the user data dir (%LOCALAPPDATA%\Zeph\debug on Windows)
# Requires `classifiers` feature.
# ML-backed injection detection and PII detection via Candle/DeBERTa models.
# When `enabled = false` (the default), the existing regex-based detection runs unchanged.
# [classifiers]
# enabled = false
# timeout_ms = 5000 # Per-inference timeout in ms (default: 5000)
# injection_model = "protectai/deberta-v3-small-prompt-injection-v2" # HuggingFace repo ID
# injection_threshold = 0.8 # Minimum score to treat result as injection (default: 0.8)
# injection_model_sha256 = "" # Optional SHA-256 hex for tamper detection
# pii_enabled = false # Enable NER-based PII detection (default: false)
# pii_model = "iiiorg/piiranha-v1-detect-personal-information" # HuggingFace repo ID
# pii_threshold = 0.75 # Minimum per-token confidence for a PII label (default: 0.75)
# pii_model_sha256 = "" # Optional SHA-256 hex for tamper detection
# Requires `experiments` feature.
# [experiments]
# enabled = false
# eval_model = "claude-sonnet-4-20250514" # Model for LLM-as-judge (default: agent's model)
# benchmark_file = "benchmarks/eval.toml" # Prompt set for A/B comparison
# max_experiments = 20 # Max variations per session (default: 20)
# max_wall_time_secs = 3600 # Wall-clock budget per session (default: 3600)
# min_improvement = 0.5 # Min score delta to accept (default: 0.5)
# eval_budget_tokens = 100000 # Token budget for judge calls (default: 100000)
# auto_apply = false # Write accepted variations to live config (default: false)
#
# [experiments.schedule]
# enabled = false # Cron-based automatic runs (default: false)
# cron = "0 3 * * *" # 5-field cron expression (default: daily 03:00)
# max_experiments_per_run = 20 # Cap per scheduled run (default: 20)
# max_wall_time_secs = 1800 # Wall-time cap per run (default: 1800)
Provider Entry Fields
Each [[llm.providers]] entry supports:
| Field | Type | Description |
|---|---|---|
type | string | Provider backend (ollama, claude, openai, gemini, candle, compatible) |
name | string? | Identifier for routing; required for compatible type |
model | string? | Chat model |
base_url | string? | API endpoint (Ollama / Compatible) |
embedding_model | string? | Embedding model |
embed | bool | Mark as the embedding provider for skill matching and semantic memory |
default | bool | Mark as the primary chat provider |
filename | string? | GGUF filename (Candle only) |
device | string? | Compute device: cpu, metal, cuda (Candle only) |
See Model Orchestrator for multi-provider routing examples and Complexity Triage Routing for pre-inference classification routing.
Environment Variables
| Variable | Description |
|---|---|
ZEPH_LLM_PROVIDER | ollama, claude, openai, candle, compatible, orchestrator, or router |
ZEPH_LLM_BASE_URL | Ollama API endpoint |
ZEPH_LLM_MODEL | Model name for Ollama |
ZEPH_LLM_EMBEDDING_MODEL | Embedding model for Ollama (default: qwen3-embedding) |
ZEPH_LLM_VISION_MODEL | Vision model for Ollama image requests (optional) |
ZEPH_CLAUDE_API_KEY | Anthropic API key (required for Claude) |
ZEPH_OPENAI_API_KEY | OpenAI API key (required for OpenAI provider) |
ZEPH_GEMINI_API_KEY | Google Gemini API key (required for Gemini provider) |
ZEPH_TELEGRAM_TOKEN | Telegram bot token (enables Telegram mode) |
ZEPH_SQLITE_PATH | SQLite database path |
ZEPH_QDRANT_URL | Qdrant server URL (default: http://localhost:6334) |
ZEPH_MEMORY_SUMMARIZATION_THRESHOLD | Trigger summarization after N messages (default: 100) |
ZEPH_MEMORY_CONTEXT_BUDGET_TOKENS | Context budget for proportional token allocation (default: 0 = unlimited) |
ZEPH_MEMORY_SOFT_COMPACTION_THRESHOLD | Soft compaction tier: prune tool outputs + apply deferred summaries (no LLM) when context usage exceeds this fraction (default: 0.60, must be < hard threshold) |
ZEPH_MEMORY_HARD_COMPACTION_THRESHOLD | Hard compaction tier: full LLM summarization when context usage exceeds this fraction (default: 0.90). Also accepted as ZEPH_MEMORY_COMPACTION_THRESHOLD for backward compatibility. |
ZEPH_MEMORY_COMPACTION_PRESERVE_TAIL | Messages preserved during compaction (default: 4) |
ZEPH_MEMORY_PRUNE_PROTECT_TOKENS | Tokens protected from Tier 1 tool output pruning (default: 40000) |
ZEPH_MEMORY_CROSS_SESSION_SCORE_THRESHOLD | Minimum relevance score for cross-session memory (default: 0.35) |
ZEPH_MEMORY_VECTOR_BACKEND | Vector backend: qdrant or sqlite (default: qdrant) |
ZEPH_MEMORY_TOKEN_SAFETY_MARGIN | Token budget safety margin multiplier (default: 1.0) |
ZEPH_MEMORY_REDACT_CREDENTIALS | Scrub credentials from LLM context (default: true) |
ZEPH_MEMORY_AUTOSAVE_ASSISTANT | Persist assistant responses to SQLite (default: false) |
ZEPH_MEMORY_AUTOSAVE_MIN_LENGTH | Min content length for assistant embedding (default: 20) |
ZEPH_MEMORY_TOOL_CALL_CUTOFF | Max visible tool pairs before oldest is summarized (default: 6) |
ZEPH_LLM_RESPONSE_CACHE_ENABLED | Enable SQLite-backed LLM response cache (default: false) |
ZEPH_LLM_RESPONSE_CACHE_TTL_SECS | Response cache TTL in seconds (default: 3600) |
ZEPH_LLM_SEMANTIC_CACHE_ENABLED | Enable semantic similarity-based response caching (default: false) |
ZEPH_LLM_SEMANTIC_CACHE_THRESHOLD | Cosine similarity threshold for semantic cache hit (default: 0.95) |
ZEPH_LLM_SEMANTIC_CACHE_MAX_CANDIDATES | Max entries examined per semantic cache lookup (default: 10) |
ZEPH_MEMORY_SQLITE_POOL_SIZE | SQLite connection pool size (default: 5) |
ZEPH_MEMORY_RESPONSE_CACHE_CLEANUP_INTERVAL_SECS | Interval for purging expired LLM response cache entries in seconds (default: 3600) |
ZEPH_MEMORY_SEMANTIC_ENABLED | Enable semantic memory (default: false) |
ZEPH_MEMORY_RECALL_LIMIT | Max semantically relevant messages to recall (default: 5) |
ZEPH_MEMORY_SEMANTIC_TEMPORAL_DECAY_ENABLED | Enable temporal decay scoring (default: false) |
ZEPH_MEMORY_SEMANTIC_TEMPORAL_DECAY_HALF_LIFE_DAYS | Half-life for temporal decay in days (default: 30) |
ZEPH_MEMORY_SEMANTIC_MMR_ENABLED | Enable MMR re-ranking (default: false) |
ZEPH_MEMORY_SEMANTIC_MMR_LAMBDA | MMR relevance-diversity trade-off (default: 0.7) |
ZEPH_SKILLS_MAX_ACTIVE | Max skills per query via embedding match (default: 5) |
ZEPH_AGENT_MAX_TOOL_ITERATIONS | Max tool loop iterations per response (default: 10) |
ZEPH_TOOLS_SUMMARIZE_OUTPUT | Enable LLM-based tool output summarization (default: false) |
ZEPH_TOOLS_TIMEOUT | Shell command timeout in seconds (default: 30) |
ZEPH_TOOLS_SCRAPE_TIMEOUT | Web scrape request timeout in seconds (default: 15) |
ZEPH_TOOLS_SCRAPE_MAX_BODY | Max response body size in bytes (default: 1048576) |
ZEPH_ACP_MAX_SESSIONS | Max concurrent ACP sessions (default: 4) |
ZEPH_ACP_SESSION_IDLE_TIMEOUT_SECS | Idle session reaper timeout in seconds (default: 1800) |
ZEPH_ACP_PERMISSION_FILE | Path to persisted ACP permission decisions (default: ~/.config/zeph/acp-permissions.toml) |
ZEPH_ACP_AUTH_TOKEN | Bearer token for ACP HTTP/WS authentication; omit for open mode (local use only) |
ZEPH_ACP_DISCOVERY_ENABLED | Expose GET /.well-known/acp.json manifest endpoint (default: true) |
ZEPH_A2A_ENABLED | Enable A2A server (default: false) |
ZEPH_A2A_HOST | A2A server bind address (default: 0.0.0.0) |
ZEPH_A2A_PORT | A2A server port (default: 8080) |
ZEPH_A2A_PUBLIC_URL | Public URL for agent card discovery |
ZEPH_A2A_AUTH_TOKEN | Bearer token for A2A server authentication |
ZEPH_A2A_RATE_LIMIT | Max requests per IP per minute (default: 60) |
ZEPH_A2A_REQUIRE_TLS | Require HTTPS for outbound A2A connections (default: true) |
ZEPH_A2A_SSRF_PROTECTION | Block private/loopback IPs in A2A client (default: true) |
ZEPH_A2A_MAX_BODY_SIZE | Max request body size in bytes (default: 1048576) |
ZEPH_AGENTS_ENABLED | Enable sub-agent system (default: false) |
ZEPH_AGENTS_MAX_CONCURRENT | Max concurrent sub-agents (default: 1) |
ZEPH_GATEWAY_ENABLED | Enable HTTP gateway (default: false) |
ZEPH_GATEWAY_BIND | Gateway bind address (default: 127.0.0.1) |
ZEPH_GATEWAY_PORT | Gateway HTTP port (default: 8090) |
ZEPH_GATEWAY_TOKEN | Bearer token for gateway authentication; warn logged at startup if unset |
ZEPH_GATEWAY_RATE_LIMIT | Max requests per IP per minute (default: 120) |
ZEPH_GATEWAY_MAX_BODY_SIZE | Max request body size in bytes (default: 1048576) |
ZEPH_TOOLS_FILE_ALLOWED_PATHS | Comma-separated directories file tools can access (empty = cwd) |
ZEPH_TOOLS_SHELL_ALLOWED_PATHS | Comma-separated directories shell can access (empty = cwd) |
ZEPH_TOOLS_SHELL_ALLOW_NETWORK | Allow network commands from shell (default: true) |
ZEPH_TOOLS_AUDIT_ENABLED | Enable audit logging for tool executions (default: false) |
ZEPH_TOOLS_AUDIT_DESTINATION | Audit log destination: stdout or file path |
ZEPH_SECURITY_REDACT_SECRETS | Redact secrets in LLM responses (default: true) |
ZEPH_TIMEOUT_LLM | LLM call timeout in seconds (default: 120) |
ZEPH_TIMEOUT_EMBEDDING | Embedding generation timeout in seconds (default: 30) |
ZEPH_TIMEOUT_A2A | A2A remote call timeout in seconds (default: 30) |
ZEPH_OBSERVABILITY_EXPORTER | Tracing exporter: none or otlp (default: none, requires otel feature) |
ZEPH_OBSERVABILITY_ENDPOINT | OTLP gRPC endpoint (default: http://localhost:4317) |
ZEPH_COST_ENABLED | Enable cost tracking (default: false) |
ZEPH_COST_MAX_DAILY_CENTS | Daily spending limit in cents (default: 500) |
ZEPH_STT_PROVIDER | STT provider: whisper or candle-whisper (default: whisper, requires stt feature) |
ZEPH_STT_MODEL | STT model name (default: whisper-1) |
ZEPH_STT_BASE_URL | STT server base URL (e.g. http://127.0.0.1:8080/v1 for local whisper.cpp) |
ZEPH_STT_LANGUAGE | STT language: ISO-639-1 code or auto (default: auto) |
ZEPH_LOG_FILE | Override logging.file (log file path; empty string disables file logging) |
ZEPH_LOG_LEVEL | Override logging.level (file log level, e.g. debug, warn) |
ZEPH_CONFIG | Path to config file (default: config/default.toml) |
ZEPH_TUI | Enable TUI dashboard: true or 1 (requires tui feature) |
ZEPH_AUTO_UPDATE_CHECK | Enable automatic update checks: true or false (default: true) |