# Retraining and Updating GenAI Models

> In an **AI Factory**, retraining is continuous but promotion must be deliberate.\
> Valohai lets you chain training, evaluation, and approval into one reproducible pipeline, ensuring every new GenAI model is validated before it's released.

### Structure Your Retraining Pipeline

A retraining pipeline typically includes:

* Training step
* Evaluation step
* Conditional promotion logic
* Human approval gate

Example pipeline:

```yaml
- step: train-model
  image: pytorch/pytorch:2.9.0-cuda12.8-cudnn9-runtime
  command: python train.py
  inputs:
    - name: training_data
      default: dataset://domain-data/v5

- step: evaluate-model
  image: python:3.10
  command: python evaluate.py
  inputs:
    - name: model
      default: model://my-gen-ai-model/candidate

- step: promote-model
  image: python:3.10
  command: python promote.py
  inputs:
    - name: results

- pipeline:
    name: train-and-evaluate-model
    nodes:
      - name: train-model
        type: execution
        step: train-model
      - name: evaluate-model
        type: execution
        step: evaluate-genai
      - name: promote-model
        type: execution
        step: promote-model
        actions:
          - when: node-starting
            then: require-approval
    edges:
    - [train-model.outputs.*, evaluate-model.inputs.model]
    - [evaluate-model.outputs.*, promote-model.inputs.results]
```

Each step is tracked automatically, datasets, parameters, outputs, and lineage are all recorded.

### Automate Regression Checks

During the evaluation step, compare new metrics against the previous model's baseline from the Model Catalog.

```python
if new_bleu < baseline_bleu:
    raise Exception("Regression detected: BLEU score dropped")
```

Valohai will mark the execution as failed, preventing promotion.\
You can trigger notifications or pipeline conditions based on these checks.

> In GenAI, regression may also mean higher variance, longer latency, or reduced factuality, not just lower numeric scores.

### Add a Human Approval Step

Insert a **pause for human approval** after automated evaluation:

```yaml
      - name: promote-model
        type: execution
        step: promote-model
        actions:
          - when: node-starting
            then: require-approval
```

Use this gate to ensure governance before promotion:

1. Review evaluation results in Valohai or externally.
2. Add comments or qualitative scores to the model's card in Model Catalog.
3. Approve to continue or reject to stop the pipeline.

> The approval step acts as a lightweight, auditable checkpoint, perfect for subjective or business-critical GenAI evaluations.

***

***

### Update and Promote in Model Catalog

After evaluation and approval, you can register a new model version directly from your Python code.

In a GenAI pipeline, the final `promote-model` step might save the trained or fine-tuned model folder to `/valohai/outputs/model/` and generate the necessary metadata.

Example:

```python
import os
import json
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
from evaluate import load  # optional Hugging Face evaluate lib

# Load data and base model
dataset = load_dataset("cnn_dailymail", "3.0.0", split="test[:200]")
model = AutoModelForCausalLM.from_pretrained("/valohai/inputs/base-model")
tokenizer = AutoTokenizer.from_pretrained("/valohai/inputs/base-model")

# Evaluate or generate predictions
prompts = dataset["article"][:10]
references = dataset["highlights"][:10]
outputs = [
    tokenizer.decode(
        model.generate(tokenizer(p, return_tensors="pt").input_ids, max_new_tokens=100)[0],
        skip_special_tokens=True,
    )
    for p in prompts
]

# Example metric
metric = load("rouge")
scores = metric.compute(predictions=outputs, references=references)
rougeL = scores["rougeL"]

print(f"ROUGE-L: {rougeL:.4f}")

# Save model files
save_dir = "/valohai/outputs/model"
os.makedirs(save_dir, exist_ok=True)
model.save_pretrained(save_dir)
tokenizer.save_pretrained(save_dir)

print(f"Saved model to {save_dir}")

# Create Valohai metadata file
metadata = {
    "model/": {
        "valohai.model-versions": ["model://summarizer/v5"],
        "valohai.tags": ["genai", "summarization", "production-candidate"],
        "rougeL": rougeL,
        "training_dataset": "vh://dataset/domain-news:v2",
        "evaluation_dataset": "vh://dataset/evaluation-prompts:v3",
    },
}

metadata_path = "/valohai/outputs/valohai.metadata.jsonl"
with open(metadata_path, "w") as f:
    for file, meta in metadata.items():
        json.dump({"file": file, "metadata": meta}, f)
        f.write("\n")

print("Created model version in model://summarizer/v5")
```

When the step completes:

* The model folder (/valohai/outputs/model/) is uploaded as the new model version.
* The metadata file (valohai.metadata.jsonl) tells Valohai how to tag and track it.
* The Model Catalog entry automatically links all relevant context:
  * Datasets and pipeline lineage
  * Metrics (ROUGE-L, BLEU, etc.)
  * Custom tags (genai, summarization, etc.)
* Version references (model://summarizer/v5)

> Use the same pattern for any GenAI workflow, from prompt-tuned adapters to finetuned instruction models, to promote them directly from your pipeline code.

### Maintain Reproducibility and Traceability

Every retraining run in Valohai automatically preserves:

* **Dataset lineage:** which training and evaluation datasets were used
* **Parameter traceability:** hyperparameters and environment details
* **Model lineage:** which model version the new one replaced
* **Approval records:** human validation history

> This full chain of evidence makes it easy to answer questions like:\
> "Which models were trained with the flawed evaluation dataset v2?"

### GenAI Considerations

| Topic                      | Recommendation                                                                     |
| -------------------------- | ---------------------------------------------------------------------------------- |
| **Continuous improvement** | Trigger retraining when new labeled data or human feedback arrives.                |
| **Baseline comparison**    | Use the same evaluation dataset for old and new models to ensure fair comparisons. |
| **Human-in-the-loop**      | Require manual approval for subjective quality checks.                             |
| **Automation balance**     | Automate quantitative checks, but keep qualitative approvals human.                |
| **Lineage tracking**       | Use Model Catalog to visualize model–dataset–approval chains.                      |

***

### Learn More

* [Managing Datasets in Valohai](https://docs.valohai.com/data/datasets)
