Aliases
Aliases are mutable pointers to immutable files. They let you reference "the latest validated model" or "the current training file" without hardcoding specific file IDs.
Aliases vs. Direct datum:// Links
Understanding when to use each:
Use Alias When:
├─ Reference should update over time
├─ Multiple people need same reference
├─ Managing production/staging versions
└─ "Latest" or "current" semantics needed
Use Direct datum:// Link When:
├─ Need exact reproducibility
├─ Auditing specific file version
├─ Archiving experiment results
└─ Immutable reference requiredKey difference:
Aliases = Update to point to new files (e.g.,
model-prod→ new model each week)Direct links = Always point to same file forever (e.g.,
datum://abc123→ immutable)
💡Reproducibility!
When you create an execution, Valohai resolves the alias to a specific
datum://link at that moment. If the alias changes later, your old execution still references the original file!
How to Add Aliases
Aliases are created through metadata using the reserved key valohai.alias.
During Execution
import json
# Alias must be a single string (not a list)
metadata = {
"valohai.alias": "model-prod"
}
# Save your output file
save_path = '/valohai/outputs/model.pkl'
model.save(save_path)
# Save metadata with alias
metadata_path = '/valohai/outputs/model.pkl.metadata.json'
with open(metadata_path, 'w') as f:
json.dump(metadata, f)When this execution completes, the alias model-prod will point to this output. If model-prod already exists, it updates to point to the new file.
Combine with Tags and Properties
import json
metadata = {
# Alias (single string)
"valohai.alias": "model-staging",
# Tags (list of strings)
"valohai.tags": ["validated", "resnet50", "2024-Q1"],
# Custom properties
"accuracy": 0.95,
"approved_by": "ml-team"
}
save_path = '/valohai/outputs/model.pkl'
model.save(save_path)
metadata_path = '/valohai/outputs/model.pkl.metadata.json'
with open(metadata_path, 'w') as f:
json.dump(metadata, f)💡 For complete details on metadata methods, see Add Context to Your Data Files
Use Aliases as Inputs
In valohai.yaml
Set aliases as default inputs for your steps:
- step:
name: batch-inference
image: python:3.9
command: python predict.py
inputs:
- name: model
default: datum://model-prod
- name: data
default: datum://inference-data-latestEvery execution automatically uses the current files referenced by these aliases.
In Web Application
When creating an execution, search for your alias name in the input file browser:

The UI shows both the alias name and the current file it points to.
Change Tracking and History
Valohai tracks every time an alias is updated.
What's tracked:
When the alias was changed
What file it pointed to before
What file it points to now
Who made the change (if via UI)
Why this matters:
Debugging — "When did model-prod start failing? Let's check when it was last updated."
Auditing — "Which model version was in production on March 15th?"
Rollback — "The new model is worse, point the alias back to the previous version."
View alias history in the Data → Aliases tab of your project.
Common Use Cases
Production Model References
Point production systems to the current approved model:
# Inference pipeline
inputs:
- name: model
default: datum://model-prod# Update when promoting new model
metadata = {
"valohai.alias": "model-prod",
"valohai.tags": ["validated", "production"],
"promoted_date": "2024-01-15",
"previous_model": "datum://abc123"
}Workflow:
Train new model → tag as
validatedTest in staging → update
model-stagingaliasApprove for production → update
model-prodaliasProduction automatically uses new model on next run
Environment-Specific Datasets
Use different aliases for each environment:
# Development
inputs:
- name: training-data
default: datum://train-data-dev# Staging
inputs:
- name: training-data
default: datum://train-data-staging# Production
inputs:
- name: training-data
default: datum://train-data-prodSame pipeline code works across all environments by changing which alias is used.
Rolling Datasets
Always train on the most recent data:
inputs:
- name: daily-data
default: datum://latest-processed-batch# After daily preprocessing job
metadata = {
"valohai.alias": "latest-processed-batch",
"valohai.tags": ["daily-batch", "2024-01-15"],
"row_count": 50000,
"processing_date": "2024-01-15"
}Training jobs automatically pick up the latest data without manual updates.
A/B Testing
Compare model versions side-by-side:
- step:
name: ab-test
image: python:3.9
command: python compare.py
inputs:
- name: model-a
default: datum://model-candidate-a
- name: model-b
default: datum://model-candidate-b
- name: test-data
default: datum://test-set-fixedUpdate aliases to test different model combinations without changing pipeline code.
Canary Deployments
Gradually roll out new models:
# 90% traffic
inputs:
- name: model-stable
default: datum://model-prod# 10% traffic
inputs:
- name: model-canary
default: datum://model-canaryMonitor canary performance before promoting to full production by updating model-prod alias.
Managing Aliases
Via Web Application
Open your project
Navigate to Data → Aliases tab
Click "Create new datum alias" or select existing alias
Choose the file the alias should point to
View change history for each alias
Via Code
Update aliases programmatically when saving outputs:
# Promote staging to production
metadata = {
"valohai.alias": "model-prod", # Updates existing alias
"valohai.tags": ["promoted-from-staging"],
"promoted_at": "2024-01-15T10:30:00Z",
"previous_version": "datum://xyz789"
}Best Practices
Use Descriptive Names
# Good: Clear purpose
"model-prod"
"model-staging"
"train-data-latest"
"validation-set-fixed"
# Avoid: Ambiguous
"model1"
"data"
"latest"
"temp"Version Your Aliases
For complex environments:
# Production versions
"model-prod-v1"
"model-prod-v2"
# Regional variants
"model-prod-us"
"model-prod-eu"
# Use case specific
"model-prod-batch-inference"
"model-prod-real-time-api"Document Alias Updates
Include context in metadata:
metadata = {
"valohai.alias": "model-prod",
"valohai.tags": ["production", "promoted"],
"promoted_by": "ml-team",
"promoted_reason": "10% accuracy improvement",
"validation_score": 0.95,
"previous_score": 0.85
}Related Pages
Add Context to Your Data Files — Overview of metadata system
Organize Files with Tags — Label files before creating aliases
Track Custom Metadata — Store rich data alongside aliases
Load Data in Jobs — Use aliases as execution inputs
Versioning and Lineage — Track file dependencies
Next Steps
Learn how to use tags to organize files before aliasing
Explore custom properties for tracking alias metadata
Set up datasets using aliased files
Last updated
Was this helpful?
