Model Orchestrator
Route tasks to different LLM providers based on content classification. Each task type maps to a provider chain with automatic fallback. Use the orchestrator to combine local and cloud models — for example, embeddings via Ollama and chat via Claude.
Configuration
[llm]
provider = "orchestrator"
[llm.orchestrator]
default = "claude"
embed = "ollama"
[llm.orchestrator.providers.ollama]
type = "ollama"
[llm.orchestrator.providers.claude]
type = "claude"
[llm.orchestrator.routes]
coding = ["claude", "ollama"] # try Claude first, fallback to Ollama
creative = ["claude"] # cloud only
analysis = ["claude", "ollama"] # prefer cloud
general = ["claude"] # cloud only
Sub-Provider Fields
Each entry under [llm.orchestrator.providers.<name>] supports:
| Field | Type | Description |
|---|---|---|
type | string | Provider backend: ollama, claude, openai, candle, compatible |
model | string? | Override chat model |
base_url | string? | Override API endpoint (Ollama / Compatible) |
embedding_model | string? | Override embedding model |
filename | string? | GGUF filename (Candle only) |
device | string? | Compute device: cpu, metal, cuda (Candle only) |
Fallback Chain for Provider Fields
Per-sub-provider fields override parent config. Resolution order:
- Per-provider field — e.g.
[llm.orchestrator.providers.ollama].base_url - Parent section — e.g.
[llm].base_urlfor Ollama,[llm.cloud].modelfor Claude - Global default — compiled-in defaults
This allows a single [llm] section to set shared defaults while individual sub-providers override only what differs.
Provider Keys
default— provider for chat when no specific route matchesembed— provider for all embedding operations (skill matching, semantic memory)
Task Classification
Task types are classified via keyword heuristics:
| Task Type | Keywords |
|---|---|
coding | code, function, debug, refactor, implement |
creative | write, story, poem, creative |
analysis | analyze, compare, evaluate |
translation | translate, convert language |
summarization | summarize, summary, tldr |
general | everything else |
Fallback Chains
Routes define provider preference order. If the first provider fails, the next one in the list is tried automatically.
coding = ["local", "cloud"] # try local first, fallback to cloud
Interactive Setup
Run zeph init and select Orchestrator as the LLM provider. The wizard prompts for:
- Primary provider — select from Ollama, Claude, OpenAI, or Compatible. Provide the model name, base URL, and API key as needed.
- Fallback provider — same selection. The fallback activates when the primary fails.
- Embedding model — used for skill matching and semantic memory.
The wizard generates a complete [llm.orchestrator] section with provider map, chat route (primary + fallback), and embed route.
Multi-Instance Example
Two Ollama servers on different ports — one for chat, one for embeddings:
[llm]
provider = "orchestrator"
base_url = "http://localhost:11434"
embedding_model = "qwen3-embedding"
[llm.orchestrator]
default = "ollama-chat"
embed = "ollama-embed"
[llm.orchestrator.providers.ollama-chat]
type = "ollama"
model = "mistral:7b"
# inherits base_url from [llm].base_url
[llm.orchestrator.providers.ollama-embed]
type = "ollama"
base_url = "http://localhost:11435" # second Ollama instance
embedding_model = "nomic-embed-text" # dedicated embedding model
[llm.orchestrator.routes]
general = ["ollama-chat"]
Hybrid Setup Example
Embeddings via free local Ollama, chat via paid Claude API:
[llm]
provider = "orchestrator"
[llm.orchestrator]
default = "claude"
embed = "ollama"
[llm.orchestrator.providers.ollama]
type = "ollama"
[llm.orchestrator.providers.claude]
type = "claude"
[llm.orchestrator.routes]
general = ["claude"]