Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Document Loaders

Zeph supports ingesting user documents (plain text, Markdown, PDF) for retrieval-augmented generation. Documents are loaded, split into chunks, embedded, and stored in Qdrant for semantic recall.

DocumentLoader Trait

All loaders implement DocumentLoader:

#![allow(unused)]
fn main() {
pub trait DocumentLoader: Send + Sync {
    fn load(&self, path: &Path) -> Pin<Box<dyn Future<Output = Result<Vec<Document>, DocumentError>> + Send + '_>>;
    fn supported_extensions(&self) -> &[&str];
}
}

Each Document contains content: String and metadata: DocumentMetadata (source path, content type, extra fields).

TextLoader

Loads .txt, .md, and .markdown files. Always available (no feature gate).

  • Reads files via tokio::fs::read_to_string
  • Canonicalizes paths via std::fs::canonicalize before reading
  • Rejects files exceeding max_file_size (default 50 MiB) with DocumentError::FileTooLarge
  • Sets content_type to text/markdown for .md/.markdown, text/plain otherwise
#![allow(unused)]
fn main() {
let loader = TextLoader::default();
let docs = loader.load(Path::new("notes.md")).await?;
}

PdfLoader

Extracts text from PDF files using pdf-extract. Requires the pdf feature:

cargo build --features pdf

Sync extraction is wrapped in tokio::task::spawn_blocking. Same max_file_size and path canonicalization guards as TextLoader.

TextSplitter

Splits documents into chunks for embedding. Configurable via SplitterConfig:

ParameterDefaultDescription
chunk_size1000Maximum characters per chunk
chunk_overlap200Overlap between consecutive chunks
sentence_awaretrueSplit on sentence boundaries (. , ? , ! , \n\n)

When sentence_aware is false, splits on character boundaries with overlap.

#![allow(unused)]
fn main() {
let splitter = TextSplitter::new(SplitterConfig {
    chunk_size: 500,
    chunk_overlap: 100,
    sentence_aware: true,
});
let chunks = splitter.split(&document);
}

IngestionPipeline

Orchestrates the full flow: load → split → embed → store.

#![allow(unused)]
fn main() {
let pipeline = IngestionPipeline::new(
    TextSplitter::new(SplitterConfig::default()),
    qdrant_ops,
    "my_documents",
    Box::new(provider.embed_fn()),
);

// Ingest from a loaded document
let chunk_count = pipeline.ingest(document).await?;

// Or load and ingest in one step
let chunk_count = pipeline.load_and_ingest(&TextLoader::default(), path).await?;
}

Each chunk is stored as a Qdrant point with payload fields: source, content_type, chunk_index, content.

CLI ingestion

Documents are ingested from the command line with the zeph ingest subcommand:

zeph ingest ./docs/                          # ingest directory recursively
zeph ingest README.md --chunk-size 256       # custom chunk size
zeph ingest ./knowledge --collection my_kb  # custom Qdrant collection

Options:

FlagDefaultDescription
--chunk-size <N>512Target character count per chunk
--chunk-overlap <N>64Overlap between consecutive chunks
--collection <NAME>zeph_documentsQdrant collection to store chunks

TUI users can trigger ingestion via the command palette: /ingest <path>.

RAG context injection

When memory.documents.rag_enabled = true, the agent automatically queries the zeph_documents Qdrant collection on each turn and prepends the top-K most relevant chunks to the context window under a ## Relevant documents heading.

[memory.documents]
rag_enabled = true
collection = "zeph_documents"
chunk_size = 512
chunk_overlap = 64
top_k = 3

RAG injection is a no-op when the collection is empty — no error is raised, the agent simply skips the retrieval step.

Tip

Run zeph ingest ./docs/ once to populate the knowledge base. Subsequent agent sessions will automatically retrieve and inject relevant chunks without any additional setup.

Configuration Reference

All document RAG settings live under [memory.documents]:

FieldTypeDefaultDescription
rag_enabledboolfalseEnable retrieval injection into the agent context
collectionstring"zeph_documents"Target Qdrant collection for document chunks
chunk_sizeusize1000Maximum tokens per chunk; controls retrieval granularity
chunk_overlapusize100Overlap between adjacent chunks in tokens; reduces boundary information loss
top_kusize3Number of chunks injected per turn

Embedding Provider

Set embed_provider on [memory.semantic] to use a dedicated [[llm.providers]] entry for generating document embeddings. This avoids contention with the main chat provider (especially relevant for Ollama, which serialises requests per model):

[[llm.providers]]
name  = "ollama-embed"
type  = "ollama"
model = "nomic-embed-text"
embed = true

[memory.semantic]
enabled        = true
embed_provider = "ollama-embed"

[memory.documents]
rag_enabled   = true
collection    = "zeph_documents"
chunk_size    = 1000
chunk_overlap = 100
top_k         = 5

Retrieval Quality

Two parameters control how retrieved content is filtered and budgeted during context assembly. These are part of [index] (code indexer), but apply similarly to document retrieval when both are active:

FieldDefaultDescription
score_threshold0.25Minimum cosine similarity score for a chunk to be injected
budget_ratio0.40Fraction of the context token budget allocated to retrieved results
[index]
score_threshold = 0.25   # drop chunks below this similarity score
budget_ratio    = 0.40   # allocate up to 40% of context budget to index/doc results

Lower score_threshold values increase recall but may inject weakly relevant chunks. Raise it (e.g. 0.4) for stricter relevance filtering. Adjust budget_ratio to balance document context against conversation history within the token budget.