Model Artifacts & Versioning

Advanced model versioning patterns, automated deployment workflows, and integration with production systems.


Overview

This guide covers:

  • Versioning strategies for different ML workflows

  • Automated deployment on approval

  • Model artifact management

  • Production deployment patterns

  • Governance through Model Hub


Versioning Strategies

Development vs. Production Versions

Use tags and states to separate development from production:

# Development/experiment version
metadata = {
    "model.pkl": {
        "valohai.model-versions": [{
            "model_uri": "model://churn-model/",
            "model_version_tags": ["experiment", "feature-test", "dev"],
            "model_release_note": "Testing new feature engineering"
        }],
        "experiment_id": "exp-042",
        "status": "experimental"
    }
}

# Production candidate version
metadata = {
    "model.pkl": {
        "valohai.model-versions": [{
            "model_uri": "model://churn-model/",
            "model_version_tags": ["production-candidate", "validated"],
            "model_release_note": "Ready for staging deployment - passed all quality gates"
        }],
        "validation_passed": True,
        "quality_score": 0.95
    }
}

Workflow:

  1. Create development versions with experiment tag (stay in Pending)

  2. Best experiment → Retag as production-candidate

  3. Validate → Approve

  4. Deploy to production


Semantic Versioning Pattern

Organize versions with semantic meaning:

# Major version: Model architecture change
metadata = {
    "model.pkl": {
        "valohai.model-versions": [{
            "model_uri": "model://recommendation/",
            "model_version_tags": ["v2.0.0", "architecture-change", "transformer"],
            "model_release_note": "Major update: Switch from collaborative filtering to transformer-based model"
        }],
        "architecture": "transformer",
        "major_version": 2
    }
}

# Minor version: Feature improvements
metadata = {
    "model.pkl": {
        "valohai.model-versions": [{
            "model_uri": "model://recommendation/",
            "model_version_tags": ["v2.1.0", "feature-update"],
            "model_release_note": "Added user engagement features, improved accuracy by 3%"
        }],
        "architecture": "transformer",
        "major_version": 2,
        "minor_version": 1
    }
}

# Patch version: Bug fixes or retraining
metadata = {
    "model.pkl": {
        "valohai.model-versions": [{
            "model_uri": "model://recommendation/",
            "model_version_tags": ["v2.1.1", "retrain", "patch"],
            "model_release_note": "Retrained on updated data - same architecture"
        }],
        "architecture": "transformer",
        "major_version": 2,
        "minor_version": 1,
        "patch_version": 1
    }
}

Environment-Specific Versions

Maintain separate version tracks for different environments:

# Staging version
metadata = {
    "model.pkl": {
        "valohai.model-versions": [{
            "model_uri": "model://fraud-detection/",
            "model_version_tags": ["staging", "2024-q1"],
            "model_release_note": "Deployed to staging for validation"
        }],
        "environment": "staging",
        "deployment_date": "2024-01-15"
    }
}

# Production version (after staging validation)
metadata = {
    "model.pkl": {
        "valohai.model-versions": [{
            "model_uri": "model://fraud-detection/",
            "model_version_tags": ["production", "2024-q1", "validated"],
            "model_release_note": "Promoted from staging after 2 weeks validation"
        }],
        "environment": "production",
        "deployment_date": "2024-01-29",
        "staging_metrics": {"false_positive_rate": 0.02}
    }
}

Model Artifact Management

Multi-File Model Packages

Package models with all required artifacts:

import pickle
import json

# Save model components
model.save('/valohai/outputs/model.h5')

with open('/valohai/outputs/tokenizer.pkl', 'wb') as f:
    pickle.dump(tokenizer, f)

with open('/valohai/outputs/label_encoder.pkl', 'wb') as f:
    pickle.dump(label_encoder, f)

with open('/valohai/outputs/config.json', 'w') as f:
    json.dump({
        'vocab_size': 10000,
        'embedding_dim': 128,
        'max_sequence_length': 512
    }, f)

# Add all files to model version
metadata = {
    "model.h5": {
        "valohai.model-versions": ["model://text-classifier/"]
    },
    "tokenizer.pkl": {
        "valohai.model-versions": ["model://text-classifier/"]
    },
    "label_encoder.pkl": {
        "valohai.model-versions": ["model://text-classifier/"]
    },
    "config.json": {
        "valohai.model-versions": ["model://text-classifier/"]
    }
}

Deployment: All files download together:

# Load complete model package
model = load_model('/valohai/inputs/model/model.h5')
tokenizer = pickle.load(open('/valohai/inputs/model/tokenizer.pkl', 'rb'))
label_encoder = pickle.load(open('/valohai/inputs/model/label_encoder.pkl', 'rb'))
config = json.load(open('/valohai/inputs/model/config.json'))

Framework-Specific Artifacts

TensorFlow/Keras:

import tensorflow as tf

# Save in multiple formats
model.save('/valohai/outputs/model.h5')  # HDF5 format

# Save weights only
model.save_weights('/valohai/outputs/model_weights.h5')

# Add all to model version
metadata = {
    "model.h5": {"valohai.model-versions": ["model://image-classifier/"]},
    "model_weights.h5": {"valohai.model-versions": ["model://image-classifier/"]}
}

PyTorch:

import torch

# Save complete model
torch.save(model, '/valohai/outputs/model_full.pth')

# Save state dict (recommended)
torch.save(model.state_dict(), '/valohai/outputs/model_state.pth')

# Save checkpoint with optimizer state
torch.save({
    'epoch': epoch,
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'loss': loss,
}, '/valohai/outputs/checkpoint.pth')

metadata = {
    "model_state.pth": {
        "valohai.model-versions": ["model://object-detection/"],
        "framework": "pytorch",
        "torch_version": torch.__version__
    }
}

ONNX Export for Deployment

Export to ONNX for cross-framework deployment:

import torch
import torch.onnx

# Export PyTorch model to ONNX
model.eval()
dummy_input = torch.randn(1, 3, 224, 224)

torch.onnx.export(
    model,
    dummy_input,
    '/valohai/outputs/model.onnx',
    export_params=True,
    opset_version=11,
    input_names=['input'],
    output_names=['output'],
    dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}}
)

# Add to model version
metadata = {
    "model.onnx": {
        "valohai.model-versions": [{
            "model_uri": "model://image-classifier/",
            "model_version_tags": ["onnx", "optimized", "production"],
            "model_release_note": "ONNX export for cross-framework deployment"
        }],
        "format": "onnx",
        "input_shape": [1, 3, 224, 224],
        "output_shape": [1, 1000]
    }
}

Production Deployment Patterns

Batch Inference

Use approved models for scheduled batch predictions:

valohai.yaml:

- step:
    name: daily-batch-inference
    image: python:3.9
    command:
      - pip install scikit-learn pandas
      - python batch_predict.py
    inputs:
      - name: model
        default: model://churn-prediction/latest  # Always uses latest approved
      - name: customers
        default: dataset://customer-data/daily-snapshot

Schedule: Run daily at 2 AM to generate predictions for customer success team.


Real-Time Serving (External)

Export model for external serving platform:

# Training: Create model version
metadata = {
    "model.pkl": {
        "valohai.model-versions": [{
            "model_uri": "model://recommendation/",
            "model_version_tags": ["serving-ready", "optimized"],
            "model_release_note": "Optimized for low-latency serving"
        }],
        "inference_latency_ms": 45,
        "model_size_mb": 120,
        "serving_framework": "tensorflow-serving"
    }
}

# Deployment: Download from Valohai (API), deploy to serving platform
# (Run outside Valohai or via deployment execution)
# Example platforms: SageMaker, Vertex AI, KServe, Seldon, custom API

Last updated

Was this helpful?