Aliases

Aliases are mutable pointers to immutable files. They let you reference "the latest validated model" or "the current training file" without hardcoding specific file IDs.


Understanding when to use each:

Use Alias When:
├─ Reference should update over time
├─ Multiple people need same reference
├─ Managing production/staging versions
└─ "Latest" or "current" semantics needed

Use Direct datum:// Link When:
├─ Need exact reproducibility
├─ Auditing specific file version
├─ Archiving experiment results
└─ Immutable reference required

Key difference:

  • Aliases = Update to point to new files (e.g., model-prod → new model each week)

  • Direct links = Always point to same file forever (e.g., datum://abc123 → immutable)

💡Reproducibility!

When you create an execution, Valohai resolves the alias to a specific datum:// link at that moment. If the alias changes later, your old execution still references the original file!


How to Add Aliases

Aliases are created through metadata using the reserved key valohai.alias.

During Execution

import json

# Alias must be a single string (not a list)
metadata = {
    "valohai.alias": "model-prod"
}

# Save your output file
save_path = '/valohai/outputs/model.pkl'
model.save(save_path)

# Save metadata with alias
metadata_path = '/valohai/outputs/model.pkl.metadata.json'
with open(metadata_path, 'w') as f:
    json.dump(metadata, f)

When this execution completes, the alias model-prod will point to this output. If model-prod already exists, it updates to point to the new file.

Combine with Tags and Properties

import json

metadata = {
    # Alias (single string)
    "valohai.alias": "model-staging",
    
    # Tags (list of strings)
    "valohai.tags": ["validated", "resnet50", "2024-Q1"],
    
    # Custom properties
    "accuracy": 0.95,
    "approved_by": "ml-team"
}

save_path = '/valohai/outputs/model.pkl'
model.save(save_path)

metadata_path = '/valohai/outputs/model.pkl.metadata.json'
with open(metadata_path, 'w') as f:
    json.dump(metadata, f)

💡 For complete details on metadata methods, see Add Context to Your Data Files


Use Aliases as Inputs

In valohai.yaml

Set aliases as default inputs for your steps:

- step:
    name: batch-inference
    image: python:3.9
    command: python predict.py
    inputs:
      - name: model
        default: datum://model-prod
      - name: data
        default: datum://inference-data-latest

Every execution automatically uses the current files referenced by these aliases.

In Web Application

When creating an execution, search for your alias name in the input file browser:

The UI shows both the alias name and the current file it points to.


Change Tracking and History

Valohai tracks every time an alias is updated.

What's tracked:

  • When the alias was changed

  • What file it pointed to before

  • What file it points to now

  • Who made the change (if via UI)

Why this matters:

  • Debugging — "When did model-prod start failing? Let's check when it was last updated."

  • Auditing — "Which model version was in production on March 15th?"

  • Rollback — "The new model is worse, point the alias back to the previous version."

View alias history in the Data → Aliases tab of your project.


Common Use Cases

Production Model References

Point production systems to the current approved model:

# Inference pipeline
inputs:
  - name: model
    default: datum://model-prod
# Update when promoting new model
metadata = {
    "valohai.alias": "model-prod",
    "valohai.tags": ["validated", "production"],
    "promoted_date": "2024-01-15",
    "previous_model": "datum://abc123"
}

Workflow:

  1. Train new model → tag as validated

  2. Test in staging → update model-staging alias

  3. Approve for production → update model-prod alias

  4. Production automatically uses new model on next run


Environment-Specific Datasets

Use different aliases for each environment:

# Development
inputs:
  - name: training-data
    default: datum://train-data-dev
# Staging
inputs:
  - name: training-data
    default: datum://train-data-staging
# Production
inputs:
  - name: training-data
    default: datum://train-data-prod

Same pipeline code works across all environments by changing which alias is used.


Rolling Datasets

Always train on the most recent data:

inputs:
  - name: daily-data
    default: datum://latest-processed-batch
# After daily preprocessing job
metadata = {
    "valohai.alias": "latest-processed-batch",
    "valohai.tags": ["daily-batch", "2024-01-15"],
    "row_count": 50000,
    "processing_date": "2024-01-15"
}

Training jobs automatically pick up the latest data without manual updates.


A/B Testing

Compare model versions side-by-side:

- step:
    name: ab-test
    image: python:3.9
    command: python compare.py
    inputs:
      - name: model-a
        default: datum://model-candidate-a
      - name: model-b
        default: datum://model-candidate-b
      - name: test-data
        default: datum://test-set-fixed

Update aliases to test different model combinations without changing pipeline code.


Canary Deployments

Gradually roll out new models:

# 90% traffic
inputs:
  - name: model-stable
    default: datum://model-prod
# 10% traffic
inputs:
  - name: model-canary
    default: datum://model-canary

Monitor canary performance before promoting to full production by updating model-prod alias.


Managing Aliases

Via Web Application

  1. Open your project

  2. Navigate to Data → Aliases tab

  3. Click "Create new datum alias" or select existing alias

  4. Choose the file the alias should point to

  5. View change history for each alias

Via Code

Update aliases programmatically when saving outputs:

# Promote staging to production
metadata = {
    "valohai.alias": "model-prod",  # Updates existing alias
    "valohai.tags": ["promoted-from-staging"],
    "promoted_at": "2024-01-15T10:30:00Z",
    "previous_version": "datum://xyz789"
}

Best Practices

Use Descriptive Names

# Good: Clear purpose
"model-prod"
"model-staging"
"train-data-latest"
"validation-set-fixed"

# Avoid: Ambiguous
"model1"
"data"
"latest"
"temp"

Version Your Aliases

For complex environments:

# Production versions
"model-prod-v1"
"model-prod-v2"

# Regional variants
"model-prod-us"
"model-prod-eu"

# Use case specific
"model-prod-batch-inference"
"model-prod-real-time-api"

Document Alias Updates

Include context in metadata:

metadata = {
    "valohai.alias": "model-prod",
    "valohai.tags": ["production", "promoted"],
    "promoted_by": "ml-team",
    "promoted_reason": "10% accuracy improvement",
    "validation_score": 0.95,
    "previous_score": 0.85
}


Next Steps

Last updated

Was this helpful?