Retraining and Updating GenAI Models
Learn how to retrain, test, and promote GenAI models safely using Valohai pipelines and Model Catalog integrations.
In an AI Factory, retraining is continuous but promotion must be deliberate. Valohai lets you chain training, evaluation, and approval into one reproducible pipeline, ensuring every new GenAI model is validated before it’s released.
Structure Your Retraining Pipeline
A retraining pipeline typically includes:
Training step
Evaluation step
Conditional promotion logic
Human approval gate
Example pipeline:
- step: train-model
image: pytorch/pytorch:2.9.0-cuda12.8-cudnn9-runtime
command: python train.py
inputs:
- name: training_data
default: dataset://domain-data/v5
- step: evaluate-model
image: python:3.10
command: python evaluate.py
inputs:
- name: model
default: model://my-gen-ai-model/candidate
- step: promote-model
image: python:3.10
command: python promote.py
inputs:
- name: results
- pipeline:
name: train-and-evaluate-model
nodes:
- name: train-model
type: execution
step: train-model
- name: evaluate-model
type: execution
step: evaluate-genai
- name: promote-model
type: execution
step: promote-model
actions:
- when: node-starting
then: require-approval
edges:
- [train-model.outputs.*, evaluate-model.inputs.model]
- [evaluate-model.outputs.*, promote-model.inputs.results]Each step is tracked automatically, datasets, parameters, outputs, and lineage are all recorded.
Automate Regression Checks
During the evaluation step, compare new metrics against the previous model’s baseline from the Model Catalog.
if new_bleu < baseline_bleu:
raise Exception("Regression detected: BLEU score dropped")Valohai will mark the execution as failed, preventing promotion. You can trigger notifications or pipeline conditions based on these checks.
In GenAI, regression may also mean higher variance, longer latency, or reduced factuality, not just lower numeric scores.
Add a Human Approval Step
Insert a pause for human approval after automated evaluation:
- name: promote-model
type: execution
step: promote-model
actions:
- when: node-starting
then: require-approvalUse this gate to ensure governance before promotion:
Review evaluation results in Valohai or externally.
Add comments or qualitative scores to the model’s card in Model Catalog.
Approve to continue or reject to stop the pipeline.
The approval step acts as a lightweight, auditable checkpoint, perfect for subjective or business-critical GenAI evaluations.
Update and Promote in Model Catalog
After evaluation and approval, you can register a new model version directly from your Python code.
In a GenAI pipeline, the final promote-model step might save the trained or fine-tuned model folder to /valohai/outputs/model/ and generate the necessary metadata.
Example:
import os
import json
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
from evaluate import load # optional Hugging Face evaluate lib
# Load data and base model
dataset = load_dataset("cnn_dailymail", "3.0.0", split="test[:200]")
model = AutoModelForCausalLM.from_pretrained("/valohai/inputs/base-model")
tokenizer = AutoTokenizer.from_pretrained("/valohai/inputs/base-model")
# Evaluate or generate predictions
prompts = dataset["article"][:10]
references = dataset["highlights"][:10]
outputs = [tokenizer.decode(model.generate(tokenizer(p, return_tensors="pt").input_ids, max_new_tokens=100)[0],
skip_special_tokens=True) for p in prompts]
# Example metric
metric = load("rouge")
scores = metric.compute(predictions=outputs, references=references)
rougeL = scores["rougeL"]
print(f"ROUGE-L: {rougeL:.4f}")
# Save model files
save_dir = "/valohai/outputs/model"
os.makedirs(save_dir, exist_ok=True)
model.save_pretrained(save_dir)
tokenizer.save_pretrained(save_dir)
print(f"Saved model to {save_dir}")
# Create Valohai metadata file
metadata = {
"model/": {
"valohai.model-versions": ["model://summarizer/v5"],
"valohai.tags": ["genai", "summarization", "production-candidate"],
"rougeL": rougeL,
"training_dataset": "vh://dataset/domain-news:v2",
"evaluation_dataset": "vh://dataset/evaluation-prompts:v3"
}
}
metadata_path = "/valohai/outputs/valohai.metadata.jsonl"
with open(metadata_path, "w") as f:
for file, meta in metadata.items():
json.dump({"file": file, "metadata": meta}, f)
f.write("\n")
print("Created model version in model://summarizer/v5")When the step completes:
The model folder (/valohai/outputs/model/) is uploaded as the new model version.
The metadata file (valohai.metadata.jsonl) tells Valohai how to tag and track it.
The Model Catalog entry automatically links all relevant context:
Datasets and pipeline lineage
Metrics (ROUGE-L, BLEU, etc.)
Custom tags (genai, summarization, etc.)
Version references (model://summarizer/v5)
Use the same pattern for any GenAI workflow, from prompt-tuned adapters to finetuned instruction models, to promote them directly from your pipeline code.
Maintain Reproducibility and Traceability
Every retraining run in Valohai automatically preserves:
Dataset lineage: which training and evaluation datasets were used
Parameter traceability: hyperparameters and environment details
Model lineage: which model version the new one replaced
Approval records: human validation history
This full chain of evidence makes it easy to answer questions like: “Which models were trained with the flawed evaluation dataset v2?”
GenAI Considerations
Continuous improvement
Trigger retraining when new labeled data or human feedback arrives.
Baseline comparison
Use the same evaluation dataset for old and new models to ensure fair comparisons.
Human-in-the-loop
Require manual approval for subjective quality checks.
Automation balance
Automate quantitative checks, but keep qualitative approvals human.
Lineage tracking
Use Model Catalog to visualize model–dataset–approval chains.
Learn More
Last updated
Was this helpful?
