Custom Execution Status

Set custom execution statuses for better monitoring, team visibility, and external integrations during long-running ML jobs.

Track progress and communicate execution state beyond Valohai's default statuses (queued, started, completed, error). Custom statuses help ML teams monitor long-running jobs, debug failures faster, and integrate with external monitoring systems.

Why Use Custom Status

For ML Engineers: Get real-time visibility into multi-hour training jobs without SSH access or log diving.

For Teams: Share meaningful progress updates like "Loading 50GB dataset" or "Epoch 45/100" that stakeholders actually understand.

For Integration: Feed execution state into monitoring dashboards, Slack notifications, or CI/CD pipelines via API.

Set Status with Direct API

This method works without additional dependencies and gives you full control over status updates.

import json
import requests

# Read Valohai execution config
with open('/valohai/config/api.json', 'r') as json_file:
    data = json.load(json_file)
    
    headers = data["set_status_detail"]["headers"]
    
    # Set your custom status
    response = requests.post(
        data["set_status_detail"]["url"], 
        headers=headers, 
        json={"status_detail": "Processing epoch 23/100"}
    )
    print(f"Status updated: {response.status_code}")

💡 The status overwrites previous custom statuses. Set it at key milestones for best visibility.

Set Status with valohai-utils

If you're already using valohai-utils in your project, this one-liner handles the API call:

import valohai

valohai.set_status_detail("Processing epoch 23/100")

Remember to add valohai-utils to your requirements.txt.

Common Status Patterns

Milestone Updates

Set status at major workflow transitions:

# Data loading phase
requests.post(url, headers=headers, json={"status_detail": "Loading training data..."})

# Training phase  
requests.post(url, headers=headers, json={"status_detail": "Training model - epoch 1/50"})

# Validation phase
requests.post(url, headers=headers, json={"status_detail": "Running validation..."})

Progress Tracking

Update status during long-running operations:

for epoch in range(total_epochs):
    if epoch % 5 == 0:  # Update every 5 epochs
        status = f"Training: epoch {epoch}/{total_epochs}"
        requests.post(url, headers=headers, json={"status_detail": status})

Error Context

Provide debugging context when things go wrong:

try:
    load_model()
except Exception as e:
    error_status = f"Failed loading model: {str(e)[:100]}"
    requests.post(url, headers=headers, json={"status_detail": error_status})
    raise

Rich Status Content

Add visual elements like progress bars and charts to your status updates. Each rich element must be valid JSON on a single line.

Text with Color

status_data = '{"type": "text", "text": "Model converged!", "color": "good"}'
requests.post(url, headers=headers, json={"status_detail": status_data})

Available colors: good, bad, warn, or any CSS color value.

Progress Gauge

# Simple percentage
gauge = '{"type": "gauge", "value": 0.75, "label": "Training Progress"}'

# With custom range
gauge = '{"type": "gauge", "value": 45, "min": 0, "max": 100, "label": "Epoch 45/100", "color": "good"}'

requests.post(url, headers=headers, json={"status_detail": gauge})

Sparkline Charts

# Loss values over time
loss_chart = '{"type": "sparkline", "values": [0.8, 0.6, 0.4, 0.3, 0.25], "color": "good"}'
requests.post(url, headers=headers, json={"status_detail": loss_chart})

Access Status via API

Retrieve custom status from external services or local scripts:

import requests
import os

# Use your Valohai API token
auth_token = os.environ['VH_API_TOKEN']
headers = {'Authorization': f'Token {auth_token}'}

# Get execution details
url = 'https://app.valohai.com/api/v0/executions/{execution_id}/'
response = requests.get(url, headers=headers)

# Extract custom status
custom_status = response.json()["status_detail"]
print(f"Current status: {custom_status}")

🔐 Keep your API token secure and out of version control. Use environment variables or secret management.

Best Practices

Update Strategically: Don't spam status updates. Set them at meaningful milestones or every N iterations.

Keep It Readable: Status shows in the UI and API responses. Make it human-friendly.

Handle Failures Gracefully: Wrap status updates in try/catch blocks so they don't crash your execution.

Use Rich Content Sparingly: Gauges and charts are great for long processes, but plain text is often clearer for distinct operations.

Last updated

Was this helpful?