Models

Valohai Model Hub is a centralized registry for managing machine learning models throughout their lifecycle, from training to production deployment.

Why Use Model Hub

The Problem: Scattered Model Artifacts

Without a model registry:

Models saved as random files in cloud storage
No clear "which model is in production?"
Can't trace which data trained which model
Manual approval processes via Slack/email
No version comparison or rollback capability
Team members can't find the right model

Example chaos:

s3://ml-bucket/project-a/
├── model_final.pkl
├── model_final_v2.pkl
├── model_actually_final.pkl
├── prod_model_jan.h5
└── best_model_DO_NOT_DELETE.pth

The Solution: Centralized Model Registry

Model Hub provides:

Single source of truth for all models
Automatic versioning from training pipelines
Built-in approval workflow (pending → approved → rejected)
Complete lineage tracking (data → code → model)
Version comparison with metrics and artifacts
Access control for governance
model:// URIs for consistent references

Example organization:

model://customer-churn/v1 (approved, production)
model://customer-churn/v2 (pending review)
model://customer-churn/v3 (rejected, overfitting)

Model Hub vs. Saving Model Files

When to Use Model Hub

Scenario

Use Model Hub

Just Save Files

Production models

✅ Yes

❌ No

Need approval workflow

✅ Yes

❌ No

Multiple model versions

✅ Yes

❌ No

Team collaboration

✅ Yes

❌ No

Lineage tracking needed

✅ Yes

Quick experiment checkpoint

⚠️ Optional

✅ Yes

Intermediate training artifacts

❌ No

✅ Yes

Workflow Comparison

Without Model Hub:

# Training
model.save("/valohai/outputs/model.pkl")

# Later... where is the production model?
# Check Slack, find S3 path, download, hope it's the right one

With Model Hub:

# Training: Create model version automatically
metadata = {
    "model.pkl": {
        "valohai.model-versions": ["model://customer-churn/"],
    },
}

# Deployment: Reference current production model
inputs:
  - name: model
    default: model://customer-churn/v1  # Clear, versioned, approved

Key Features

Automatic Lineage Tracking

Every model version automatically tracks:

Training data — Which dataset versions were used
Code version — Exact commit that trained the model
Hyperparameters — All parameters from training
Environment — Docker image, dependencies
Training metrics — Loss, accuracy, custom metrics
Artifacts — Model files, checkpoints, configs

Benefit: "Which data trained the production model?" → One click to see complete history.

Approval Workflow

Built-in state management for model lifecycle:

Training → model version created (Pending)
       ↓
    Review metrics and lineage
       ↓
    Approve → Production use
       OR
    Reject → Document why (overfitting, bias, etc.)

States:

Pending — Newly created, awaiting review
Approved — Validated for production use
Rejected — Not suitable for production

Benefit: Clear approval trail for compliance and governance.

Version Comparison

Compare model versions side-by-side:

Training metrics (accuracy, loss, F1)
Hyperparameters
Training data
File sizes and artifacts
Training duration

Benefit: "Is v2 really better than v1?" → Compare metrics directly.

Unified Access

All models in one place:

Organization view → All models across projects
Project view → Models relevant to this project
Search and filter by tags, state, metrics
Download artifacts or use in pipelines

Benefit: No hunting through S3 buckets or file shares.

Use Cases

Production Model Management

Scenario: Deploy and monitor production models with approval gates.

Train weekly → Create model version (Pending)
            ↓
         Review performance
            ↓
    Better than current? → Approve
            ↓
    Update deployment to model://churn-model/latest

Benefit: Controlled releases with audit trail.

A/B Testing

Scenario: Compare model variants in production.

# Deploy two versions simultaneously
- deployment-a:
    model: model://recommendation-engine/v5
    traffic: 90%

- deployment-b:
    model: model://recommendation-engine/v6
    traffic: 10%

Benefit: Safe model rollout with easy rollback.

Model Lineage & Compliance

Scenario: Audit which data trained production models.

Regulator: "Which customer data trained your credit model?"
       ↓
Model Hub: model://credit-score/v3 (production)
       ↓
Lineage: Trained on dataset://credit-data/2024-q1
       ↓
Dataset: Contains data from Jan-Mar 2024

Benefit: Complete audit trail for compliance.

Team Collaboration

Scenario: Multiple data scientists training models, ML engineer deploying.

Data Scientist A: Creates model://fraud-detection/v10 (Pending)
                  "Improved recall by 5%"
       ↓
Data Scientist B: Reviews metrics, approves
       ↓
ML Engineer: Deploys model://fraud-detection/v10
             (Knows it's reviewed and approved)

Benefit: Clear handoff between roles.

Experiment Tracking

Scenario: Track dozens of training runs, pick best.

Training sweep: 50 model versions with different hyperparameters
       ↓
Model Hub: Compare metrics, filter by accuracy > 0.9
       ↓
Approve top 3 → Further testing
Reject others → Document findings

Benefit: Organized experimentation with clear winners.

model:// URI Format

Models are referenced using model:// URIs, similar to datum:// and dataset:// links.

Format

model://<model-name>/<version>

Examples:

model://customer-churn/v1        # Specific version
model://customer-churn/v2        # Another version
model://customer-churn/latest    # Latest approved version

Using model:// URIs

In valohai.yaml:

- step:
    name: batch-inference
    image: python:3.9
    command: python predict.py
    inputs:
      - name: model
        default: model://customer-churn/v1
      - name: data
        default: dataset://inference-data/daily

In code:

# Model downloaded to /valohai/inputs/model/
model_path = "/valohai/inputs/model/model.pkl"
model = load_model(model_path)

Benefit: Consistent, versioned references across all workflows.

Model Hub vs. Other Registries

Feature

Valohai Model Hub

MLflow

W&B

SageMaker Model Registry

Built-in lineage

✅ Automatic

⚠️ Manual logging

❌ Limited

Approval workflow

✅ Built-in

❌ No

✅ Manual

Versioned inputs

✅ model:// URIs

⚠️ Manual paths

✅ ARNs

Access control

✅ Built-in

⚠️ Enterprise only

✅ Yes

Training integration

✅ Automatic

⚠️ Manual tracking

⚠️ Manual

Reproducibility

✅ Full pipeline

⚠️ Model only

Valohai differentiator: Automatic lineage from full pipeline execution, not just model files.

Getting Started

Ready to use Model Hub? Follow these guides:

Create and Manage Models — Create models, versions, approval workflow
Model Artifacts & Versioning — Save models from training, use in deployment

Load Files in Jobs — Use models as execution inputs

Next Steps

Create your first model in Model Hub
Set up automatic model versioning from training
Configure approval workflow for your team
Deploy using versioned model:// URIs

PreviousAzure Files NextCreate and Manage Models

Last updated 1 month ago

Was this helpful?

Why Use Model Hub

The Problem: Scattered Model Artifacts

The Solution: Centralized Model Registry

Model Hub vs. Saving Model Files

When to Use Model Hub

Workflow Comparison

Key Features

Automatic Lineage Tracking

Approval Workflow

Version Comparison

Unified Access

Use Cases

Production Model Management

A/B Testing

Model Lineage & Compliance

Team Collaboration

Experiment Tracking

model:// URI Format

Format

Using model:// URIs

Model Hub vs. Other Registries

Getting Started

Related Pages

Next Steps