Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Conversation Summarization

Automatically compress long conversation histories using LLM-based summarization to stay within context budget limits.

Requires an LLM provider (Ollama or Claude). Set context_budget_tokens = 0 to disable proportional allocation and use unlimited context.

For the full context management pipeline (semantic recall, message trimming, compaction, tool output management), see Context Engineering.

Configuration

[memory]
summarization_threshold = 100
context_budget_tokens = 8000  # Set to LLM context window size (0 = unlimited)

How It Works

  • Triggered when message count exceeds summarization_threshold (default: 100)
  • Summaries stored in SQLite with token estimates
  • Batch size = threshold/2 to balance summary quality with LLM call frequency
  • Context builder allocates proportional token budget:
    • 15% for summaries
    • 25% for semantic recall (if enabled)
    • 60% for recent message history

Token Estimation

Token counts are estimated using a chars/4 heuristic (100x faster than tiktoken, ±25% accuracy). This is sufficient for proportional budget allocation where exact counts are not critical.