Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Enable Self-Learning Skills

This guide walks you through enabling and tuning Zeph’s self-learning system so that skills automatically improve based on execution outcomes and user corrections.

For a full technical reference of the underlying mechanisms, see Self-Learning Skills.

Prerequisites

  • Zeph installed and configured with at least one LLM provider
  • Qdrant running locally (required for correction recall)
  • At least one skill installed

Step 1 — Enable Core Learning

Add the following to your config/default.toml:

[skills.learning]
enabled = true
auto_activate = false   # review LLM-generated improvements before they go live
min_failures = 3
improve_threshold = 0.7

With auto_activate = false, new skill versions are generated but held for your approval. Run /skill versions to review them and /skill approve <id> to promote one.

Step 2 — Enable Implicit Feedback Detection

FeedbackDetector watches each user turn for implicit corrections — phrases like “that’s wrong”, “try again”, or significant topic shifts. Detected corrections are stored and recalled automatically.

[agent.learning]
correction_detection = true
correction_confidence_threshold = 0.7  # tune sensitivity (lower = more corrections captured)
correction_recall_limit = 3
correction_min_similarity = 0.75

Corrections are stored in both SQLite and the zeph_corrections Qdrant collection. The top-3 most similar corrections are injected into the system prompt on relevant queries.

Multi-Language Support

FeedbackDetector matches correction patterns across 7 languages: English, Russian, Spanish, German, French, Chinese (Simplified), and Japanese. Each language uses dual anchoring: anchored patterns (message starts with the phrase) and unanchored patterns (phrase embedded mid-sentence). No per-language configuration is needed — all patterns are compiled into a single flat list at startup.

Mixed-language inputs are supported: “That’s неправильно” (Russian correction embedded in English) matches correctly. For unsupported languages (Korean, Arabic, etc.), the regex detector returns no signal; enable the judge detector (detector_mode = "judge") to handle these cases via LLM classification.

Step 2b — Enable LLM-Backed Judge (Optional)

By default, correction detection uses regex patterns only. If you want higher recall for ambiguous or non-English corrections, enable the judge detector:

[skills.learning]
detector_mode = "judge"
judge_model = "claude-sonnet-4-6"   # leave empty to use the primary provider
judge_adaptive_low = 0.5            # regex confidence floor (default: 0.5)
judge_adaptive_high = 0.8           # regex confidence ceiling (default: 0.8)

The judge only fires when regex confidence is borderline or when regex finds nothing — it does not replace regex. A rate limiter caps judge calls at 5 per 60 seconds. Judge calls run in the background and do not block the response.

Start with detector_mode = "regex" (the default) and switch to "judge" only if you notice corrections being missed. The judge adds LLM cost per borderline detection.

Step 3 — Switch to Hybrid Skill Matching

BM25+cosine hybrid matching improves recall for skills with distinctive trigger keywords while keeping semantic matching for paraphrased queries.

[skills]
hybrid_search = true
cosine_weight = 0.7   # reduce to 0.5 to give BM25 more weight

When hybrid search is enabled, the system prompt includes skill health attributes (trust, wilson, outcomes) so the LLM can factor in reliability.

Step 4 — Enable EMA Routing (Multi-Provider Setups)

If you run multiple providers via routing = "ema" in [llm], EMA routing continuously reorders providers by latency:

[llm]
routing = "ema"
router_ema_enabled = true
router_ema_alpha = 0.1       # lower = more weight on historical latency
router_reorder_interval = 10 # re-evaluate every 10 requests

Monitoring

Use these in-session commands to monitor the system:

/skill stats       — Wilson scores, trust levels, outcome counts per skill
/skill versions    — list pending and approved LLM-generated versions

The TUI dashboard (zeph --tui) shows real-time confidence bars:

  • Green bar — Wilson score ≥ 0.75
  • Yellow — 0.40–0.74
  • Red — below 0.40 (at risk of automatic demotion)

Manually Triggering Improvement

If a skill is clearly wrong, reject it immediately instead of waiting for failures to accumulate:

/skill reject <name> <reason>

For example:

/skill reject docker "generates docker run commands without the -it flag for interactive shells"

This triggers the LLM improvement pipeline on the next agent cycle.

[skills]
hybrid_search = true
cosine_weight = 0.7

[skills.learning]
enabled = true
auto_activate = false
min_failures = 3
improve_threshold = 0.7
rollback_threshold = 0.5
min_evaluations = 5
max_versions = 10
cooldown_minutes = 60
detector_mode = "regex"   # switch to "judge" for LLM-backed detection

[agent.learning]
correction_detection = true
correction_confidence_threshold = 0.7
correction_recall_limit = 3
correction_min_similarity = 0.75

Keep auto_activate = false until you have enough history to trust the LLM-generated improvements.

Step 5 – Enable D2Skill Step-Level Correction (Optional)

D2Skill extends the improvement pipeline with targeted step-level error correction. Instead of regenerating an entire skill after failures, D2Skill identifies the specific failing step and corrects only that step:

[skills.learning]
d2skill_enabled = true         # Enable step-level error correction (default: false)

This reduces LLM cost during improvement cycles and preserves working steps within multi-step skills.

Step 6 – Enable SkillOrchestra RL Routing (Optional)

When you have 10+ skills with overlapping descriptions, SkillOrchestra adds an RL routing head that learns from execution outcomes to improve skill selection over time:

[skills]
rl_routing_enabled = true      # Enable RL-based skill routing (default: false)

SkillOrchestra requires [skills.learning] enabled = true to collect reward signals. It falls back to standard BM25+cosine matching during cold start until enough observations accumulate.

See SkillOrchestra for details on the contextual bandit algorithm and tuning.