Finetuning LLMs in Valohai
Learn how to finetune large language models (LLMs) in Valohai using reproducible datasets, pipelines, and model catalog entries.
Finetuning adapts a foundation model to a specific domain, tone, or dataset. In Valohai, you can run finetuning jobs reproducibly, track datasets, parameters, and resulting models exactly as you would in traditional ML.
When to finetune: Finetuning is often a last resort. Before committing to a specific base model, use Valohai pipelines to compare out‑of‑the‑box models (OpenAI, Anthropic, Llama) or build lightweight RAG pipelines. If retrieval and prompt engineering cannot reach your target quality, finetuning becomes relevant.
Prepare Training Data
Create a Valohai dataset containing your training and evaluation splits.
dataset://finetune-data/v1
├── train.jsonl
├── eval.jsonl
└── metadata.yamlEach JSONL file should contain prompt–response pairs:
{"prompt": "Translate 'hello' to French", "response": "bonjour"}
{"prompt": "Summarize: Large language models are...", "response": "They are models trained to generate text."}When the dataset evolves, create a new version (v2, v3, …) instead of overwriting files. This preserves reproducibility across experiments.
Managing evolving datasets (versioned, append/replace/remove)
When your finetuning dataset changes, create a new dataset version (v2, v3, …) based on the previous version instead of overwriting files.
Update only what changed: You can append new samples, replace corrected files, or remove deprecated files in the new version.
No full duplication: Unchanged files are reused from the previous version; only modified or new files are stored again.
Clear lineage: Each file retains where it originated (e.g., an execution output or upload) and where it is used (which models or evaluations depend on it).
Why this matters
Reproducibility: Every experiment and model can be traced back to the exact dataset version used.
Audit & impact analysis: If a gold answer or training sample is wrong, you can instantly find which models or evaluations relied on it via dataset history.
Rollbacks: Promote or re-run against a previous version without guesswork.
Storage efficiency: You don’t create massive duplicates of unchanged files.
Governance: Versioned changes form an auditable trail of how your benchmark or training data evolved.
Example evolution
finetune-data (dataset)
├─ v1/
│ ├─ train.jsonl
│ └─ eval.jsonl
├─ v2/ # based on v1; append and fix
│ ├─ train.jsonl # replaced with new/extra samples
│ ├─ eval.jsonl # reused from v1 (unchanged)
│ └─ notes.md # new file (changelog/rationale)
└─ v3/ # based on v2; retire a subset
├─ train.jsonl # replaced (removed deprecated block)
└─ eval.jsonl # reused from v2💡 TL;DR: Version the dataset, don’t overwrite it. In each new version, append, replace, or remove just the changed parts. You get precise lineage and reproducibility without duplicating everything.
Define a Finetuning Step
Example YAML snippet for finetuning a Hugging Face model:
- step: finetune-llm
image: valohai/pytorch:2.1
command:
- python finetune.py {parameters}
inputs:
- name: training_data
default: dataset://finetune-data/v1
parameters:
- name: base_model
default: "mistral-7b"
- name: epochs
default: 3Your finetune.py script can follow Hugging Face’s Trainer API or a custom loop:
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
from datasets import load_dataset
import valohai
model = AutoModelForCausalLM.from_pretrained("/valohai/inputs/base-model")
tokenizer = AutoTokenizer.from_pretrained("/valohai/inputs/base-model")
data = load_dataset("json", data_files="/valohai/inputs/training_data/train.jsonl")
args = TrainingArguments(
output_dir="/valohai/outputs/model",
per_device_train_batch_size=4,
num_train_epochs=valohai.parameters("epochs").value,
save_total_limit=1,
)
trainer = Trainer(model=model, args=args, train_dataset=data["train"])
trainer.train()
trainer.save_model("/valohai/outputs/model")
tokenizer.save_pretrained("/valohai/outputs/model")Evaluate the Finetuned Model
After finetuning, add an evaluation step similar to the one described in Evaluating and Validating GenAI Applications.
- step: evaluate-model
image: valohai/python:3.10
command:
- python evaluate.py
inputs:
- name: model
default: models://finetune-llm/v5
- name: eval_data
default: dataset://finetune-data/v1The evaluation script should compute relevant metrics (BLEU, ROUGE, BERTScore, factuality) and log them with:
import json
print(json.dumps({
"rougeL": 0.42,
"bleu": 0.37,
"factuality": 0.83,
}))
# or using valohai-utils
import valohai
with valohai.metadata.logger() as logger:
logger.log("rougeL", 0.42)
logger.log("bleu", 0.37)
logger.log("factuality", 0.83)Use the same evaluation dataset for all model versions to make metrics comparable.
Save and Register the Model
When training completes, you can register the finetuned model directly from your code:
import json, os
metadata = {
"model/": {
"valohai.model-versions": ["model://domain-llm/v6"],
"valohai.tags": ["genai", "finetuned"],
"base_model": "mistral-7b",
"epochs": 3
}
}
with open("/valohai/outputs/valohai.metadata.jsonl", "w") as f:
for file, meta in metadata.items():
json.dump({"file": file, "metadata": meta}, f)
f.write("\n")This automatically creates a new model version in the Model Catalog and links it to the datasets and metrics from the pipeline.
Integrate with Retraining Pipelines
You can reuse the finetuning step inside a larger retraining pipeline. See an example in the "Retraining and Updating GenAI Models" page.
This ensures continuous improvement, retraining, finetuning, and evaluation, all captured under one Valohai lineage.
GenAI Considerations
Dataset structure
Use prompt–response JSONL files with clear splits for training and evaluation.
Model formats
Save full Hugging Face model folders (.safetensors, config.json, tokenizer.json).
Evaluation
Reuse the evaluation workflow described in the previous guide for consistency.
Storage
For large weights, link external URIs but keep metadata inside Valohai.
Reproducibility
Each dataset, base model, and pipeline version is automatically tracked in lineage.
Next Steps
Last updated
Was this helpful?
