Conversation Summarization

Automatically compress long conversation histories using LLM-based summarization to stay within context budget limits.

Requires an LLM provider (Ollama or Claude). Set context_budget_tokens = 0 to disable proportional allocation and use unlimited context.

For the full context management pipeline (semantic recall, message trimming, compaction, tool output management), see Context Engineering.

Configuration

[memory]
summarization_threshold = 100
context_budget_tokens = 8000  # Set to LLM context window size (0 = unlimited)

How It Works

Triggered when message count exceeds summarization_threshold (default: 100)
Summaries stored in SQLite with token estimates
Batch size = threshold/2 to balance summary quality with LLM call frequency
Context builder allocates proportional token budget:
- 15% for summaries
- 25% for semantic recall (if enabled)
- 60% for recent message history

Token Estimation

Token counts are estimated using a chars/4 heuristic (100x faster than tiktoken, ±25% accuracy). This is sufficient for proportional budget allocation where exact counts are not critical.

Keyboard shortcuts

Zeph Documentation

Conversation Summarization

Configuration

How It Works

Token Estimation