Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Complexity Triage Routing

Complexity triage routing (routing = "triage") classifies each request before inference and routes it to the most appropriate provider tier based on difficulty. A cheap, fast model acts as the classifier; heavier models are reserved for genuinely difficult requests.

How It Works

On each request the router:

  1. Sends the user’s message to the triage provider (a small, fast model).
  2. The triage model returns a single word: simple, medium, complex, or expert.
  3. The router looks up the configured provider for that tier and forwards the full request to it.
  4. If triage times out or returns an unparseable response, the request falls back to the lowest configured tier (simple).

Context size is also considered: when a request’s message history exceeds the selected tier provider’s context window, the router automatically escalates to the next tier. This escalation count is tracked in the triage metrics.

Tier Definitions

TierTypical requests
simpleShort factual questions, greetings, one-liners
mediumSummarization, translation, structured extraction
complexMulti-step reasoning, code generation, analysis
expertResearch-grade tasks, long-form synthesis, advanced mathematics

Enabling Triage Routing

Set routing = "triage" in [llm] and add a [llm.complexity_routing] section:

[llm]
routing = "triage"

[llm.complexity_routing]
enabled = true
triage_provider = "fast"
bypass_single_provider = true
triage_timeout_secs = 5

[llm.complexity_routing.tiers]
simple = "fast"
medium = "default"
complex = "smart"
expert = "expert"

[[llm.providers]]
name = "fast"
type = "ollama"
model = "qwen3:1.7b"

[[llm.providers]]
name = "default"
type = "ollama"
model = "qwen3:8b"
default = true

[[llm.providers]]
name = "smart"
type = "claude"
model = "claude-haiku-4-5-20251001"

[[llm.providers]]
name = "expert"
type = "claude"
model = "claude-sonnet-4-6"

Each tier value must match a name field in one of the [[llm.providers]] entries. Tiers are optional — any omitted tier resolves to the first configured tier provider (simple).

Bypass Optimization

When bypass_single_provider = true (the default) and all configured tiers resolve to the same provider name, the triage call is skipped entirely. This avoids a redundant LLM call when, for example, only two tiers are configured and both point to the same model:

[llm.complexity_routing.tiers]
simple  = "fast"
medium  = "fast"   # same provider — triage is bypassed
complex = "smart"
# expert not set — resolves to "fast" (first tier)

Note

Bypass is evaluated at construction time. Changing tier assignments requires a config reload or restart.

Timeout and Fallback

The triage call is bounded by triage_timeout_secs (default: 5 seconds). When the triage model does not respond in time or returns an unrecognised label, the router falls back to the simple tier provider and increments the timeout_fallbacks metric counter.

[llm.complexity_routing]
triage_provider = "fast"
triage_timeout_secs = 3   # fail fast on slow local model

Hybrid Mode: Triage + Cascade

Setting fallback_strategy = "cascade" enables hybrid routing: triage selects the initial tier, and cascade quality escalation is applied on top. If the selected tier provider returns a degenerate response (empty, repetitive, incoherent), the router escalates to the next tier automatically.

[llm.complexity_routing]
triage_provider = "fast"
fallback_strategy = "cascade"

[llm.complexity_routing.tiers]
simple  = "fast"
medium  = "default"
complex = "smart"
expert  = "expert"

Note

fallback_strategy = "cascade" is the only supported value. This option is reserved for future expansion.

Configuration Reference

[llm.complexity_routing] fields (active when routing = "triage"):

FieldTypeDefaultDescription
triage_providerstring?Pool entry name of the fast classifier model. Required when bypass_single_provider is false.
bypass_single_providerbooltrueSkip triage when all tier mappings resolve to the same provider name.
triage_timeout_secsu645Timeout for the triage classification call in seconds. On timeout, falls back to the simple tier.
max_triage_tokensusize50Maximum output tokens allowed in the triage response.
fallback_strategystring?Set to "cascade" to enable hybrid triage + quality escalation.

[llm.complexity_routing.tiers] fields:

FieldTypeDefaultDescription
simplestring?Provider name for trivial requests. Used as the fallback provider on triage failure.
mediumstring?Provider name for moderate requests.
complexstring?Provider name for multi-step or code-heavy requests.
expertstring?Provider name for research-grade or highly complex requests.

All tier fields are optional. Unset tiers fall back to simple; if simple is also unset, the first [[llm.providers]] entry is used.

Metrics

The triage router exposes counters accessible via the TUI metrics panel and the debug log:

CounterDescription
callsTotal triage classification calls made
tier_simpleRequests routed to simple
tier_mediumRequests routed to medium
tier_complexRequests routed to complex
tier_expertRequests routed to expert
timeout_fallbacksClassifications that timed out or failed to parse
escalationsContext-window auto-escalations

Known Limitations

  • Triage accuracy depends entirely on the quality of the classifier model. A weak or poorly-prompted model may mislabel requests.
  • The triage call adds latency before every request when bypass is not active. Use a locally hosted small model (e.g. qwen3:1.7b via Ollama) to keep overhead below 500 ms.
  • Multiple concurrent Zeph instances share no triage state — each instance classifies independently.