Migrate Your ML Jobs
Your existing ML code can run on Valohai with minimal changes. This guide walks you through migrating your workflows in under 5 hours, keeping your code intact while gaining versioning, reproducibility, and scalability. This page covers an overview for the steps. More detailed instructions for each of them can be found in the other sections.
Migration Timeline
Step 1: Define dependencies
Step 2: Create valohai.yaml (30 minutes)
Step 3: Add parameters and metrics (1-2 hours)
Step 4: Configure outputs (30 minutes)
Step 5: Update data access (1-2 hours)
Step 1: Define Your Dependencies
Identify the Python packages your code needs. You have two options:
Option A: Install at runtime
pip install -r requirements.txt
conda install pandas=0.13.1Option B: Use a Docker image with pre-installed dependencies
image: tensorflow/tensorflow:2.6.0💡 Tip: Include version numbers to ensure reproducible environments across all executions.
Step 2: Write Your valohai.yaml (30 minutes)
Create a valohai.yaml file in your repository root. Start simple—your existing code runs as-is:
- step:
name: train-model
image: tensorflow/tensorflow:2.6.0
command:
- pip install -r requirements.txt
- python train_model.pyThat's it. Your job now runs on Valohai without touching your Python code.
Step 3: Add Parameters and Metrics (1-2 hours)
Parameters
If your code uses argparse or similar, this takes minutes. Define parameters in valohai.yaml :
- step:
name: train-model
image: tensorflow/tensorflow:2.6.0
command:
- python train_model.py {parameters}
parameters:
- name: iterations
type: integer
default: 10
- name: learningrate
type: float
default: 0.01Metrics
Log metrics by printing JSON from your Python code, e.g.
print(json.dumps({
"precision": 0.8125,
"recall": 0.8667,
"f1_score": 0.8387
}))Valohai automatically captures and visualizes these metrics.
Step 4: Save Output Artifacts (30 minutes)
Save models, CSVs, or any outputs to /valohai/outputs/ directory:
# Before: local save
model.save('model.h5')
# After: Valohai versioned output
model.save('/valohai/outputs/model.h5')Valohai automatically versions and uploads all outputs to your cloud storage.
Step 5: Update Data Access (1-2 hours)
Valohai handles all the complexity of cloud storage—authentication, access control, downloading, and caching. Your code just reads from local paths while Valohai manages everything behind the scenes.
Define Your Data Sources
Specify inputs in your YAML configuration:
- step:
name: train-model
image: tensorflow/tensorflow:2.6.0
command:
- python train_model.py
inputs:
- name: images
keep-directories: suffix
default:
- s3://mybucket/factories/images/*.png
- azure://myblobstorage/factories/images/*.png
- gs://mybucket/factories/images/*.pngSimplify Your Code
Remove all cloud authentication and data management code:
# Before: Complex cloud operations
s3_client = boto3.client('s3',
aws_access_key_id=KEY,
aws_secret_access_key=SECRET)
download_from_s3('mybucket/factories/images/')
handle_caching_logic()
# After: Just read local files
images = '/valohai/inputs/images/'
# All files are already there, downloaded and cached by ValohaiValohai automatically:
Authenticates with your cloud storage
Downloads files to the execution environment
Caches the input data for faster access
Works identically across AWS, Azure, GCP, and on-premises storage
💡 Advanced data management: Use Valohai datasets and aliases to version your data without hardcoding storage paths. Reference data as
dataset://my-training-dataor models asmodel://cats-v2for better tracking and reproducibility.
You're Done! 🎉
Your ML jobs now run on Valohai with:
Automatic versioning of code, data, and outputs
Experiment tracking and comparison
Scalability across cloud and on-premises infrastructure
No vendor lock-in—your code remains portable
Next Steps
Last updated
Was this helpful?
