Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

References & Inspirations

Zeph is built on a foundation of research, engineering practice, and open protocol work from many authors. This page collects the papers, blog posts, specifications, and tools that directly shaped its design. Each entry is linked to the issue or feature where it was applied.


Agent Architecture & Orchestration

LLMCompiler: An LLM Compiler for Parallel Function Calling (ICML 2024)
Jin et al. — Identifies tool calls within a single LLM response that have no data dependencies and executes them in parallel. Demonstrated 3.7× latency improvement and 6× cost savings vs. sequential ReAct. Influenced Zeph’s intra-turn parallel dispatch design (#1646).
https://arxiv.org/abs/2312.04511

RouteLLM: Learning to Route LLMs with Preference Data (ICML 2024)
Ong et al. — Framework for learning cost-quality routing between strong and weak models. Background for Zeph’s model router and Thompson Sampling approach (#1339).
https://arxiv.org/abs/2406.18665

Unified LLM Routing + Cascading (ICLR 2025)
Try cheapest model first, escalate on quality threshold. Consistent 4% improvement over static routing. Influenced Zeph’s cascade routing research (#1339).
https://openreview.net/forum?id=AAl89VNNy1

Context Engineering in Manus (Lance Martin, Oct 2025)
Practical breakdown of how the Manus agent handles context: soft compaction via observation masking, hard compaction via schema-based trajectory summarization, and just-in-time tool result retrieval. Directly influenced Zeph’s soft/hard compaction stages, schema-based summarization, and [tool output pruned; full content at {path}] reference pattern (#1738, #1740).
https://rlancemartin.github.io/2025/10/15/manus/


Memory & Knowledge Graphs

A-MEM: Agentic Memory for LLM Agents (NeurIPS 2025)
Each memory write triggers a mini-agent action that generates structured attributes (keywords, tags) and dynamically links the note to related existing entries via embedding similarity. Memory organization is itself agentic rather than schema-driven. Influenced Zeph’s write-time memory linking design (#1694).
https://arxiv.org/abs/2502.12110

Zep: A Temporal Knowledge Graph Architecture for Agent Memory (Jan 2025)
Introduces temporal edge validity (valid_from / valid_until) on knowledge graph edges. Expired facts are preserved for historical queries rather than deleted. Achieves 18.5% accuracy improvement on LongMemEval. Informed Zeph’s graph memory temporal edge design and the Graphiti integration study (#1693).
https://arxiv.org/abs/2501.13956

Graphiti: Real-Time Knowledge Graphs for AI Agents (Zep, 2025)
Open-source implementation of temporal knowledge graphs for agents. Studied as a reference architecture for Zeph’s zeph-memory graph storage layer.
https://github.com/getzep/graphiti

TA-Mem: Adaptive Retrieval Dispatch by Query Type (Mar 2026)
Shows that routing memory queries to different retrieval strategies by type (episodic vs. semantic) outperforms a fixed hybrid pipeline. Episodic queries (“what did I say yesterday?”) benefit from FTS5 + timestamp lookup; semantic queries benefit from vector similarity. Directly implemented in Zeph’s HeuristicRouter in zeph-memory (#1629, PR #1789).
https://arxiv.org/abs/2603.09297

Episodic-to-Semantic Memory Promotion (Jan 2025)
Two papers on consolidating episodic memories into stable semantic facts via background clustering and LLM-driven merging. Influenced Zeph’s memory tier design (episodic / working / semantic) (#1608).
https://arxiv.org/pdf/2501.11739 · https://arxiv.org/abs/2512.13564

Temporal Versioning on Knowledge Graph Edges (Apr 2025)
Research on tracking fact evolution over time in agent knowledge graphs. Background for Zeph’s planned temporal edge columns on the SQLite edges table (#1341).
https://arxiv.org/abs/2504.19413

MAGMA: Multi-Graph Agentic Memory Architecture (Jan 2026)
Represents each memory item across four orthogonal relation graphs (semantic, temporal, causal, entity) and frames retrieval as policy-guided graph traversal. Dual-stream write handles fast synchronous ingestion and async background consolidation. Outperforms A-MEM (0.58) and MemoryOS (0.55) on LoCoMo with 0.70. Research target for Zeph’s multi-edge-type graph memory upgrade (#1821).
https://arxiv.org/abs/2601.03236


Context Management & Compression

ACON: Optimizing Context Compression for Long-horizon LLM Agents (ICLR 2026)
Gradient-free failure-driven approach: when compressed context causes a task failure that full context avoids, an LLM updates the compression guidelines in natural language. Achieves 26–54% token reduction with up to 46% performance improvement. Directly implemented in Zeph as compression guideline injection into the compaction prompt (#1647, PR #1808).
https://arxiv.org/abs/2510.00615

Effective Context Engineering for AI Agents (Anthropic, 2025)
Engineering guide covering just-in-time retrieval, lightweight identifiers as context references, and proactive vs. reactive context management. Co-inspired Zeph’s tool output overflow and reference injection pattern (#1740).
https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

Efficient Context Management for AI Agents (JetBrains Research, Dec 2025)
Production study finding that LLM summarization causes 13–15% trajectory elongation, while observation masking cuts costs >50% vs. unmanaged context and outperforms summarization on task completion. Motivated Zeph’s compaction_hard_count / turns_after_hard_compaction metrics (#1739).
https://blog.jetbrains.com/research/2025/12/efficient-context-management/

Structured Anchored Summarization (Factory.ai, 2025)
Proposes typed summary schemas with mandatory sections (goal, decisions, open questions, next steps) to prevent LLM compressors from silently dropping critical facts. Influenced Zeph’s structured compaction prompt design (#1607).
https://factory.ai/news/compressing-context

Evaluating Context Compression (Factory.ai / ICLR 2025)
Function-first metric: inject the summary as context, ask factual questions derived from the original turns, measure answer accuracy. Background for Zeph’s compaction probe design (#1609).
https://factory.ai/news/evaluating-compression · https://arxiv.org/abs/2410.10347

Claude Context Management & Compaction API (Anthropic, 2026)
Reference for Zeph’s integration with Claude’s server-side compact-2026-01-12 beta and prompt caching strategy (#1626).
https://platform.claude.com/docs/en/build-with-claude/context-management


Security & Safety

OWASP AI Agent Security Cheat Sheet (2026 edition)
Comprehensive checklist of security controls for agentic systems. Used as a gap analysis baseline for Zeph’s security hardening roadmap (#1650).
https://cheatsheetseries.owasp.org/cheatsheets/AI_Agent_Security_Cheat_Sheet.html

Prompt Injection Defenses (Anthropic Research, 2025)
Anthropic’s technical overview of indirect prompt injection attack vectors and defense strategies (spotlighting, context sandboxing, dual-LLM pattern). Directly informed Zeph’s ContentSanitizer and QuarantinedSummarizer design (#1195).
https://www.anthropic.com/research/prompt-injection-defenses

How Microsoft Defends Against Indirect Prompt Injection Attacks (Microsoft MSRC, 2025)
Engineering practices for isolation of untrusted content at system boundaries. Co-informed Zeph’s TrustLevel / ContentSource model and source-specific sanitization boundaries (#1195).
https://www.microsoft.com/en-us/msrc/blog/2025/07/how-microsoft-defends-against-indirect-prompt-injection-attacks

Indirect Prompt Injection Attacks Survey (arxiv, 2025)
Survey of injection attack vectors across web scraping, tool results, and memory retrieval paths. Background for Zeph’s multi-layer isolation design (#1195).
https://arxiv.org/html/2506.08837v1

Log-To-Leak: Prompt Injection via Model Context Protocol (OpenReview, 2025)
Demonstrates that malicious MCP servers can embed injection instructions in tool description fields that bypass content sanitization, since tool definitions are ingested as trusted system context. Motivated Zeph’s MCP tool description sanitization at registration time (#1691).
https://openreview.net/forum?id=UVgbFuXPaO

Policy Compiler for Secure Agentic Systems (Feb 2026)
Argues that embedding authorization rules in LLM system prompts is insecure; proposes a declarative policy DSL compiled into a deterministic pre-execution enforcement layer independent of prompt content. Background for Zeph’s PolicyEnforcer design and PermissionPolicy hardening (#1695).
https://arxiv.org/html/2602.16708v2

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations (Meta AI, 2023)
Binary safety classifier (SAFE / UNSAFE) trained on the MLCommons taxonomy. Inspired Zeph’s GuardrailFilter classifier prompt design and strict prefix-matching output protocol (#1651).
https://arxiv.org/abs/2312.06674

Automated Adversarial Red-Teaming with DeepTeam (2025)
Framework for black-box red-teaming of agents via external endpoints. Background for Zeph’s red-teaming playbook targeting the daemon A2A endpoint (#1610).
https://arxiv.org/abs/2503.16882 · https://github.com/confident-ai/deepteam

AgentAssay: Behavioral Fingerprinting for LLM Agents (2025)
Evaluation framework for characterizing agent behavior under adversarial probing. Referenced in Zeph’s Promptfoo integration research (#1523).
https://arxiv.org/html/2603.02601

Promptfoo: Automated Agent Red-Teaming (open source)
CLI tool for automated agent security testing with 50+ vulnerability classes. Evaluated as a black-box test harness against Zeph’s ACP HTTP+SSE transport (#1523).
https://github.com/promptfoo/promptfoo · https://www.promptfoo.dev/docs/red-team/agents/


Protocols & Standards

Agent-to-Agent (A2A) Protocol Specification
Google DeepMind open protocol for agent discovery and interoperability via JSON-RPC 2.0. Zeph implements both A2A client and server in zeph-a2a.
https://raw.githubusercontent.com/a2aproject/A2A/main/docs/specification.md

Model Context Protocol (MCP) Specification (2025-11-25)
Anthropic’s open protocol for LLM tool and resource integration. Zeph’s zeph-mcp crate implements the full MCP client with multi-server lifecycle and Qdrant-backed tool registry.
https://modelcontextprotocol.io/specification/2025-11-25.md

Agent Client Protocol (ACP)
IDE-native protocol for bidirectional agent ↔ editor communication. Zeph’s zeph-acp crate supports stdio, HTTP+SSE, and WebSocket transports and works in Zed, Helix, and VS Code.
https://agentclientprotocol.com/get-started/introduction

ACP Rust SDK
Reference implementation used as the base for Zeph’s ACP transport layer.
https://github.com/agentclientprotocol/rust-sdk

SKILL.md Specification (agentskills.io)
Portable skill format defining metadata, triggers, examples, and version metadata in a single Markdown file. Zeph’s skill system is fully compatible with this format.
https://agentskills.io/specification.md


Instruction File Conventions

The zeph.md / CLAUDE.md / AGENTS.md pattern for project-scoped agent instructions was inspired by conventions established across the ecosystem:

ToolConvention fileReference
Claude CodeCLAUDE.mdhttps://code.claude.com/docs/en/memory
OpenAI CodexAGENTS.mdhttps://developers.openai.com/codex/guides/agents-md/
Gemini CLIGEMINI.mdhttps://geminicli.com/docs/cli/gemini-md/
Cursor.cursor/ruleshttps://cursor.com/docs/context/rules
AiderCONVENTIONS.mdhttps://aider.chat/docs/usage/conventions.html
agents.md specagents.mdhttps://agents.md/

Zeph unifies these under a single zeph.md that is always loaded, with provider-specific files loaded alongside it automatically (#1122).


LLM Provider Documentation

Google Gemini API — Text generation, embeddings, function calling, and model catalog.
Basis for Zeph’s GeminiProvider implementation (#1592).
https://ai.google.dev/gemini-api/docs/text-generation

Anthropic Claude Prompt Caching — Block-level caching with 5-minute TTL and automatic breakpoints.
Directly implemented in crates/zeph-llm/src/claude.rs with stable/tools/volatile block splits.
https://platform.claude.com/docs/en/build-with-claude/prompt-caching

OpenAI Structured Outputs — Strict JSON schema enforcement for function calling responses.
Referenced when debugging graph memory extraction schema compatibility (#1656).
https://platform.openai.com/docs/guides/structured-outputs

Redis AI Agent Architecture — Multi-tier caching patterns for LLM API cost reduction.
Background for Zeph’s semantic response caching research (#1521).
https://redis.io/blog/ai-agent-architecture/


This page is maintained alongside the codebase. When a new research issue is filed or a paper is implemented, the relevant entry should be added here.