Create and Manage Models

Create models in Model Hub, generate versions from training executions, manage approval states, and use models in production pipelines.

Overview

Model Hub workflow:

Create model — Define model in registry (one-time setup)
Train model — Run training execution
Create version — Automatically add model version from outputs
Review & approve — Validate metrics, approve for production
Deploy — Use model:// URI in production workflows

Create a Model

Models are containers for versions. Create a model once, then add multiple versions over time.

Via Web UI

Navigate to Models (in project or organization view)
Click "Create Model"
Enter Model name (e.g., "flower")
Optionally associate with a project
Click "Create"

⚠️ Important: The model URI (flower) becomes model://flower/ and cannot be changed later. Choose carefully.

Project-Associated Models

Organization view: All models visible Project view: Only associated models visible (cleaner organization)

To associate:

Creating from project view → Automatically associated
Creating from org view → Choose project during creation
Change later → Organization Settings → Models

Benefit: Organize models by project while still using them across projects.

Create Model Versions from Training

The recommended approach: automatically create model versions when training executions complete.

Step 1: Train and Save Model

train.py:

import pickle
import json
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import pandas as pd

# Load data
data = pd.read_csv('/valohai/inputs/training-data/data.csv')
X = data.drop('target', axis=1)
y = data['target']

# Train model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = RandomForestClassifier(n_estimators=100, max_depth=10)
model.fit(X_train, y_train)

# Evaluate
accuracy = model.score(X_test, y_test)
print(f"Accuracy: {accuracy:.4f}")

# Save model file to outputs
model_path = '/valohai/outputs/model.pkl'
with open(model_path, 'wb') as f:
    pickle.dump(model, f)

print(f"Saved model to {model_path}")

Step 2: Create Model Version with Metadata

In your train.py add valohai.model-versions metadata to create model version automatically:

# After saving model file, create metadata
metadata = {
    "model.pkl": {
        "valohai.model-versions": ["model://flower/"],
        "valohai.tags": ["computer-vision", "production-candidate"],
        "accuracy": accuracy,
        "training_samples": len(X_train),
        "test_samples": len(X_test)
    }
}

# Save metadata file
metadata_path = '/valohai/outputs/valohai.metadata.jsonl'
with open(metadata_path, 'w') as f:
    for filename, file_metadata in metadata.items():
        json.dump({"file": filename, "metadata": file_metadata}, f)
        f.write('\n')

print(f"Created model version in model://customer-churn/")

What happens:

Execution saves model.pkl to /valohai/outputs/
Metadata file tells Valohai to add this to model://flower/
New model version created in Pending state
Version includes model.pkl and all metadata

Step 3: Add Release Notes and Tags

Include additional metadata for the version:

metadata = {
    "model.pkl": {
        "valohai.model-versions": [{
            "model_uri": "model://flower/",
            "model_version_tags": ["improved-recall", "production-ready"],
            "model_release_note": "Improved recall by 8% using balanced class weights"
        }],
        "valohai.tags": ["computer-vision", "v2-architecture"],
        "accuracy": 0.94,
        "precision": 0.92,
        "recall": 0.89,
        "f1_score": 0.905
    }
}

metadata_path = '/valohai/outputs/valohai.metadata.jsonl'
with open(metadata_path, 'w') as f:
    for filename, file_metadata in metadata.items():
        json.dump({"file": filename, "metadata": file_metadata}, f)
        f.write('\n')

Metadata fields:

model_uri — Which model to add version to
model_version_tags — Tags specific to this version
model_release_note — Description of changes/improvements
Custom properties — Any metrics or context (accuracy, etc.)

Complete Training Example

train.py:

import pickle
import json
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import pandas as pd

# Load training data
print("Loading training data...")
data = pd.read_csv('/valohai/inputs/training-data/data.csv')
X = data.drop('churn', axis=1)
y = data['churn']

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Training set: {len(X_train)} samples")
print(f"Test set: {len(X_test)} samples")

# Train model
print("Training model...")
model = GradientBoostingClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=5,
    random_state=42
)

model.fit(X_train, y_train)

# Evaluate
print("Evaluating model...")
y_pred = model.predict(X_test)

metrics = {
    'accuracy': accuracy_score(y_test, y_pred),
    'precision': precision_score(y_test, y_pred),
    'recall': recall_score(y_test, y_pred),
    'f1_score': f1_score(y_test, y_pred),
    'training_samples': len(X_train),
    'test_samples': len(X_test)
}

print(f"Accuracy: {metrics['accuracy']:.4f}")
print(f"Precision: {metrics['precision']:.4f}")
print(f"Recall: {metrics['recall']:.4f}")
print(f"F1 Score: {metrics['f1_score']:.4f}")

# Save model
print("Saving model...")
model_path = '/valohai/outputs/model.pkl'
with open(model_path, 'wb') as f:
    pickle.dump(model, f)

# Save feature importance
feature_importance = pd.DataFrame({
    'feature': X.columns,
    'importance': model.feature_importances_
}).sort_values('importance', ascending=False)

feature_importance.to_csv('/valohai/outputs/feature_importance.csv', index=False)

# Create model version with complete metadata
metadata = {
    "model.pkl": {
        "valohai.model-versions": [{
            "model_uri": "model://flower/",
            "model_version_tags": ["improved-recall", "production-ready"],
            "model_release_note": "Improved recall by 8% using balanced class weights"
        }],
        "valohai.tags": ["computer-vision", "v2-architecture"],
        "accuracy": 0.94,
        "precision": 0.92,
        "recall": 0.89,
        "f1_score": 0.905
    }
}

# Save metadata
metadata_path = '/valohai/outputs/valohai.metadata.jsonl'
with open(metadata_path, 'w') as f:
    for filename, file_metadata in metadata.items():
        json.dump({"file": filename, "metadata": file_metadata}, f)
        f.write('\n')

print("Model version created successfully!")
print("Status: Pending (awaiting review)")

valohai.yaml:

- step:
    name: train-flower-model
    image: python:3.9
    command:
      - pip install scikit-learn pandas
      - python train.py
    inputs:
      - name: training-data
        default: dataset://customer-data/train-2024-q1

Result:

✅ Model version created in Model Hub
✅ State: Pending (awaiting approval)
✅ Files: model.pkl, feature_importance.csv
✅ Metadata: Metrics, tags, release notes
✅ Lineage: Linked to training execution and data

Model Version States

Every model version has a state in the approval workflow.

Pending (Initial State)

What it means: Newly created, awaiting review

When to use: All new model versions start here

Actions available:

Review metrics and lineage
Compare to previous versions
Approve or reject

Approved

What it means: Validated for production use

When to use: Model meets quality criteria and is ready for deployment

Actions available:

Use in production pipelines
Compare to other approved versions
Revert to pending if issues found

How to approve:

Open model version in UI
Review metrics and artifacts
Click "Approve" button
Add approval notes (optional)

Rejected

What it means: Not suitable for production

When to use: Model fails quality checks, shows bias, or has issues

Actions available:

Document rejection reason
Use rejection notes to inform next iteration
Cannot use in production (intentionally blocked)

How to reject:

Open model version in UI
Click "Reject" button
Required: Add rejection reason
Common reasons: "Overfitting on test set", "Bias detected in predictions", "Worse than baseline"

Model Version Numbers

Versions are automatically numbered sequentially:

model://customer-churn/v1   # First version
model://customer-churn/v2   # Second version
model://customer-churn/v3   # Third version

Special aliases:

model://customer-churn/latest   # Latest approved version

💡 Tip: Use latest alias for production deployments that should automatically use newest approved version.

Use Models in Workflows

As Execution Input

valohai.yaml:

- step:
    name: batch-inference
    image: python:3.9
    command:
      - pip install scikit-learn pandas
      - python predict.py
    inputs:
      - name: model
        default: model://customer-churn/v1
      - name: inference-data
        default: dataset://customer-data/inference-batch

predict.py:

import pickle
import pandas as pd

# Load model from model:// input
model_path = '/valohai/inputs/model/model.pkl'
with open(model_path, 'rb') as f:
    model = pickle.load(f)

# Load inference data
data = pd.read_csv('/valohai/inputs/inference-data/data.csv')

# Make predictions
predictions = model.predict(data)
probabilities = model.predict_proba(data)

# Save predictions
results = pd.DataFrame({
    'customer_id': data['customer_id'],
    'churn_prediction': predictions,
    'churn_probability': probabilities[:, 1]
})

results.to_csv('/valohai/outputs/predictions.csv', index=False)
print(f"Generated predictions for {len(results)} customers")

Using Latest Approved Version

inputs:
  - name: model
    default: model://customer-churn/latest  # Always uses latest approved

Benefit: Update model version, approve it, and all production pipelines automatically use the new version on next run.

Create Model Version via UI

For existing model files not from training executions:

Navigate to your model
Click "Create Version"
Search for files in data library
Select model file(s)
Add version tags and release notes
Click "Create"

Use cases:

Import externally trained models
Promote experiment checkpoints to model registry
Add models trained outside Valohai

Manage Multiple Files per Version

A model version can contain multiple files:

metadata = {
    "model.pkl": {
        "valohai.model-versions": ["model://recommendation/"]
    },
    "preprocessor.pkl": {
        "valohai.model-versions": ["model://recommendation/"]
    },
    "feature_config.json": {
        "valohai.model-versions": ["model://recommendation/"]
    }
}

Result: All three files included in the version:

model://recommendation/v1
├── model.pkl
├── preprocessor.pkl
└── feature_config.json

Access in inference:

model = pickle.load(open('/valohai/inputs/model/model.pkl', 'rb'))
preprocessor = pickle.load(open('/valohai/inputs/model/preprocessor.pkl', 'rb'))
config = json.load(open('/valohai/inputs/model/feature_config.json'))

Legacy Approach: Sidecar Metadata Files

The older approach used individual .metadata.json files:

import json

# Save model
model.save('/valohai/outputs/model.pkl')

# Create sidecar metadata file
metadata = {
    "valohai.model-versions": ["model://customer-churn/"]
}

with open('/valohai/outputs/model.pkl.metadata.json', 'w') as f:
    json.dump(metadata, f)

This still works, but JSONL format is recommended for:

Consolidating metadata for multiple files
Cleaner outputs directory
Consistency with dataset versioning

Find Model URI

In Model Hub UI:

Navigate to your model
Select a version
Copy model URI from version details panel

Format:

model://<model-name>/<version>

Common Workflow: Train → Approve → Deploy

Step 1: Train Model

vh execution run --step train-churn-model

Result: New version created in Pending state

Step 2: Review & Approve

Open Model Hub → Find model
View new version (Pending)
Review:
- Training metrics
- Lineage (which data was used)
- Compare to previous versions
Click "Approve"
Add approval notes: "Approved for staging deployment - 3% improvement in recall"

Result: Version state changed to Approved

Step 3: Deploy

Option A: Update deployment to use new version:

- step:
    name: production-inference
    inputs:
      - name: model
        default: model://customer-churn/v5  # Updated from v4

Option B: Use latest alias (automatic):

- step:
    name: production-inference
    inputs:
      - name: model
        default: model://customer-churn/latest  # Auto-updates to v5

Result: Production system now uses approved model v5. Model can be used for inference inside Valohai executions or deployed outside Valohai to your serving infrastructure.

Best Practices

Descriptive Model Names

✅ Good:
model://customer-churn
model://fraud-detection-transactions
model://recommendation-engine-products

❌ Avoid:
model://model1
model://my-model
model://test

Use Release Notes

# ✅ Good: Informative release notes
{"model_release_note": "Improved recall by 8% using class weights. Addressed bias in age feature."}

# ❌ Avoid: Vague notes
{"model_release_note": "New model"}

Tag Strategically

# ✅ Good: Meaningful tags
"model_version_tags": ["gradient-boosting", "production-ready", "q1-2024"]

# ❌ Avoid: Generic tags
"model_version_tags": ["model", "new", "good"]

Include Key Metrics

# ✅ Good: Complete metrics
metadata = {
    "model.pkl": {
        "valohai.model-versions": ["model://churn/"],
        "accuracy": 0.94,
        "precision": 0.92,
        "recall": 0.89,
        "f1_score": 0.905,
        "auc_roc": 0.96,
        "training_samples": 50000,
        "test_samples": 12500
    }
}

Review Before Approving

Checklist:

✅ Metrics better than baseline
✅ No overfitting (train vs test performance similar)
✅ Lineage verified (correct training data)
✅ Fairness/bias checked
✅ Comparison to previous version documented

Models Overview — Why use Model Hub
Model Artifacts & Versioning — Advanced versioning patterns
Add Context to Your Data Files — Metadata system details

Next Steps

Create your first model in Model Hub
Train a model and create a version automatically
Set up approval workflow with your team
Deploy using model:// URIs
Explore automated deployment workflows

PreviousModels NextModel Artifacts & Versioning

Last updated 5 hours ago

Was this helpful?

Overview

Create a Model

Via Web UI

Project-Associated Models

Create Model Versions from Training

Step 1: Train and Save Model

Step 2: Create Model Version with Metadata

Step 3: Add Release Notes and Tags

Complete Training Example

Model Version States

Pending (Initial State)

Approved

Rejected

Model Version Numbers

Use Models in Workflows

As Execution Input

Using Latest Approved Version

Create Model Version via UI

Manage Multiple Files per Version

Legacy Approach: Sidecar Metadata Files

Find Model URI

Common Workflow: Train → Approve → Deploy

Step 1: Train Model

Step 2: Review & Approve

Step 3: Deploy

Best Practices

Descriptive Model Names

Use Release Notes

Tag Strategically

Include Key Metrics

Review Before Approving

Related Pages

Next Steps