RAG and Context Pipelines in Valohai

Retrieval-Augmented Generation (RAG) systems combine vector search with large language models to produce grounded, explainable answers. Valohai provides the reproducible system behind these fast-moving architectures, so you can plug in new models, retrievers, or vector databases without breaking your workflow.

What is RAG and why it matters

RAG enhances large language models by retrieving factual context from your own data. Instead of relying solely on the model’s training corpus, a RAG system:

  1. Embeds your documents into vectors,

  2. Retrieves relevant chunks for each query,

  3. Generates grounded responses, and

  4. Evaluates factuality and relevance.

In Valohai, each stage becomes a versioned step in a reproducible pipeline — so as the ecosystem evolves (OpenAI → Anthropic → Meta → etc.), you can swap tools but keep lineage intact.

💡 In a fast-moving GenAI landscape, your retrieval or model provider may change — your reproducible pipeline shouldn’t.

RAG pipeline architecture in Valohai

A simple RAG workflow maps naturally onto a Valohai pipeline:

ingest → embed → retrieve → generate → evaluate

Each component runs as a step and you can replace or parallelize any of them independently.

Embedding step

The embedding step converts documents into vectors and saves them as a versioned dataset.

Example embed.py

Secrets: Store your API keys (e.g., OPENAI_API_KEY) as Valohai secrets in your project settings, they’ll be injected automatically at runtime without exposing them in code or YAML.

Each new run produces a new dataset version (rag-vectors/v2, v3, …), ensuring you can trace which document snapshot produced which embeddings.

Retrieval and generation step

Retrieve relevant chunks and generate responses using different LLM providers.

Example query.py

Run the same pipeline against multiple providers to benchmark quality, latency, and cost.

Evaluation step

Use evaluation steps to track both retrieval and generation performance.

Example evaluate.py

Evaluate both sides of RAG:

  • Retrieval metrics (Recall@K, MRR)

  • Generation metrics (Factuality, BLEU, ROUGE, or GPT-judge)

  • Operational metrics (Latency, Cost)

You can automate evaluations whenever:

  • A new document dataset is uploaded

  • A new embedding model is tested

  • A new provider (e.g., Claude or Llama) is added

Human approval and comparison

Add a pause for human approval after automated evaluation:

Review model outputs manually or through your own interface, then approve to continue. All human decisions are logged in the pipeline’s audit trail.

Use the Model Catalog to compare:

Provider
Model
Recall@K
Factuality
Cost (USD)
Decision

OpenAI

GPT-4-Turbo

0.89

4.6

0.004

Anthropic

Claude 3.5

0.87

4.8

0.003

Meta

Llama-3-70B

0.84

4.2

0.001

🔄 re-test

Integrations and triggers

RAG pipelines often connect multiple external tools. Valohai integrates easily with them through webhooks, notifications, and the REST API.

  • Webhooks: Re-run embedding pipelines automatically when your document corpus updates.

  • Notifications: Post evaluation summaries to Slack or Teams for reviewers.

  • REST API: Programmatically launch, track, or fetch the latest RAG model metrics.

All retrievals, embeddings, and evaluations stay reproducible, observable, and interchangeable.

Summary

  • RAG pipelines evolve quickly — models, embeddings, and retrievers come and go.

  • Valohai provides the stable, auditable foundation beneath that evolution.

  • Every step (embed → retrieve → evaluate) is versioned, testable, and integrable with any provider.

Learn More

Last updated

Was this helpful?