Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Cocoon Decentralized TEE Provider

Cocoon is a decentralized inference network that executes LLM requests in Trusted Execution Environments (TEEs) on a peer-to-peer network of secure nodes. Zeph supports native integration with optional speech-to-text transcription via the Cocoon sidecar.

Cocoon is particularly useful for:

  • Confidential inference — Requests execute in hardware-isolated TEEs; no server-side model access
  • Privacy compliance — End-to-end encrypted communication path with zero-knowledge server operations
  • Flexible deployment — Run locally with a sidecar or connect to public Cocoon nodes
  • Multi-modal support — Text chat, tool use, and STT transcription in one provider

Setup

Prerequisites

  1. Install the Cocoon sidecar (local deployment only):

    # Download from https://cocoon.org or build from source
    cocoon --version
    
  2. Start the sidecar on the default port (8765):

    cocoon serve
    # Or on a custom port:
    cocoon serve --port 9000
    

Configuration

Add a Cocoon provider entry to your config:

[[llm.providers]]
type = "cocoon"
name = "cocoon-local"
base_url = "http://localhost:8765"  # Sidecar endpoint
model = "llama2-7b"                 # Available model on sidecar

Or store the base URL in the vault for security:

zeph vault set ZEPH_COCOON_CLIENT_URL "http://localhost:8765"

Then reference it in config:

[[llm.providers]]
type = "cocoon"
name = "cocoon-local"
base_url = "${ZEPH_COCOON_CLIENT_URL}"
model = "llama2-7b"

Features

Chat and Streaming

Cocoon supports both single-turn and streaming chat:

[[llm.providers]]
type = "cocoon"
name = "cocoon"
base_url = "http://localhost:8765"
model = "llama2-7b"
max_tokens = 2048
temperature = 0.7

Tool Use (Function Calling)

Cocoon fully supports tool definitions and structured function calling:

  • Define tools in your skills and system prompt
  • Zeph automatically formats tool calls for Cocoon
  • Streaming tool use is supported with incremental JSON parsing

Speech-to-Text (STT)

The Cocoon sidecar includes a Whisper-compatible STT endpoint at /v1/audio/transcriptions. Configure Zeph to use it:

[[llm.providers]]
type = "cocoon"
name = "cocoon-stt"
stt_model = "whisper-1"  # Enable STT on this provider

When configured, Zeph automatically transcribes voice messages and Telegram audio notes using this provider. See Audio & Vision for more details.

Per-Token Pricing (Cocoon Models)

Unlike cloud providers, Cocoon models may not be in Zeph’s built-in pricing table. Configure per-1K-token pricing for accurate cost tracking:

[[llm.providers]]
type = "cocoon"
name = "cocoon-custom"
base_url = "http://localhost:8765"
model = "my-custom-model"

# Per-1K-token pricing in cents (prompt + completion)
cocoon_pricing = { prompt_cents = 1, completion_cents = 2 }

This enables the cost tracker to report accurate token consumption and pricing for your Cocoon inference.

Multi-Model Routing

Combine Cocoon with other providers for cost-effective multi-tier inference:

[[llm.providers]]
type = "cocoon"
name = "cocoon-smart"
base_url = "http://localhost:8765"
model = "llama2-13b"

[[llm.providers]]
type = "ollama"
name = "ollama-fast"
base_url = "http://localhost:11434"
model = "qwen3:1.7b"

[llm]
routing = "triage"  # Route by complexity

[llm.complexity_routing]
triage_provider = "ollama-fast"
simple = "ollama-fast"      # Quick questions → fast model
medium = "ollama-fast"      # Moderate tasks → fast model
complex = "cocoon-smart"    # Complex reasoning → TEE
expert = "cocoon-smart"     # Expert tasks → TEE

Diagnostics

Use the zeph cocoon doctor command to verify sidecar health and configuration:

zeph cocoon doctor

Output example:

Cocoon Diagnostics
==================
Config entry:           [OK] cocoon-local present in config
Sidecar reachability:   [OK] http://localhost:8765/stats
Proxy connection:       [OK] Direct connection established
Worker count:           [OK] 4 workers available
Model listing:          [OK] 7 models available
Vault key resolution:   [OK] ZEPH_COCOON_CLIENT_URL resolved

JSON Output

For automation and scripting, use --json:

zeph cocoon doctor --json

TUI Integration

When using the TUI dashboard with Cocoon enabled, check sidecar status and available models:

  • /cocoon status — Display sidecar health, worker count, and TON balance
  • /cocoon models — List all available models on the sidecar

Status updates automatically every 30 seconds in the background.

TON Balance Display

The TUI sidebar can display your Cocoon TON balance in real time. By default, the balance is shown. To hide it for privacy (displays *** TON instead), configure:

[cocoon]
show_balance = false    # Hide TON balance in TUI sidebar (default: true)

This setting is interactive during the zeph init wizard under the Cocoon setup step.

Configuration Reference

FieldTypeDefaultDescription
typestringMust be "cocoon"
namestringUnique provider identifier
base_urlstring"http://localhost:8765"Sidecar endpoint URL
modelstringModel name available on the sidecar
stt_modelstring(optional)Model to use for speech-to-text
cocoon_pricingtable(optional)Per-1K-token pricing in cents
max_tokensinteger2048Max tokens in response
temperaturefloat0.7Sampling temperature
top_pfloat1.0Nucleus sampling parameter

Troubleshooting

Sidecar Not Reachable

If you see Cocoon: sidecar unreachable in the TUI status bar:

  1. Verify the sidecar is running:

    curl -s http://localhost:8765/stats | jq .
    
  2. Check the base URL matches your sidecar port

  3. Ensure network connectivity (if sidecar is on a different machine)

Vault Key Issues

If zeph cocoon doctor reports vault key errors:

# Set the URL in the vault
zeph vault set ZEPH_COCOON_CLIENT_URL "http://localhost:8765"

# Verify it resolves
zeph vault get ZEPH_COCOON_CLIENT_URL

STT Not Working

Verify the Whisper endpoint is available on the sidecar:

curl -s http://localhost:8765/v1/audio/transcriptions -X OPTIONS

If it returns 405 or 404, the sidecar may not have STT support compiled in.

See Also