Custom Execution Status
Set custom execution statuses for better monitoring, team visibility, and external integrations during long-running ML jobs.
Track progress and communicate execution state beyond Valohai's default statuses (queued, started, completed, error). Custom statuses help ML teams monitor long-running jobs, debug failures faster, and integrate with external monitoring systems.
Why Use Custom Status
For ML Engineers: Get real-time visibility into multi-hour training jobs without SSH access or log diving.
For Teams: Share meaningful progress updates like "Loading 50GB dataset" or "Epoch 45/100" that stakeholders actually understand.
For Integration: Feed execution state into monitoring dashboards, Slack notifications, or CI/CD pipelines via API.
Set Status with Direct API
This method works without additional dependencies and gives you full control over status updates.
import json
import requests
# Read Valohai execution config
with open('/valohai/config/api.json', 'r') as json_file:
data = json.load(json_file)
headers = data["set_status_detail"]["headers"]
# Set your custom status
response = requests.post(
data["set_status_detail"]["url"],
headers=headers,
json={"status_detail": "Processing epoch 23/100"}
)
print(f"Status updated: {response.status_code}")💡 The status overwrites previous custom statuses. Set it at key milestones for best visibility.
Set Status with valohai-utils
If you're already using valohai-utils in your project, this one-liner handles the API call:
import valohai
valohai.set_status_detail("Processing epoch 23/100")Remember to add valohai-utils to your requirements.txt.
Common Status Patterns
Milestone Updates
Set status at major workflow transitions:
# Data loading phase
requests.post(url, headers=headers, json={"status_detail": "Loading training data..."})
# Training phase
requests.post(url, headers=headers, json={"status_detail": "Training model - epoch 1/50"})
# Validation phase
requests.post(url, headers=headers, json={"status_detail": "Running validation..."})Progress Tracking
Update status during long-running operations:
for epoch in range(total_epochs):
if epoch % 5 == 0: # Update every 5 epochs
status = f"Training: epoch {epoch}/{total_epochs}"
requests.post(url, headers=headers, json={"status_detail": status})Error Context
Provide debugging context when things go wrong:
try:
load_model()
except Exception as e:
error_status = f"Failed loading model: {str(e)[:100]}"
requests.post(url, headers=headers, json={"status_detail": error_status})
raiseRich Status Content
Add visual elements like progress bars and charts to your status updates. Each rich element must be valid JSON on a single line.
Text with Color
status_data = '{"type": "text", "text": "Model converged!", "color": "good"}'
requests.post(url, headers=headers, json={"status_detail": status_data})Available colors: good, bad, warn, or any CSS color value.
Progress Gauge
# Simple percentage
gauge = '{"type": "gauge", "value": 0.75, "label": "Training Progress"}'
# With custom range
gauge = '{"type": "gauge", "value": 45, "min": 0, "max": 100, "label": "Epoch 45/100", "color": "good"}'
requests.post(url, headers=headers, json={"status_detail": gauge})Sparkline Charts
# Loss values over time
loss_chart = '{"type": "sparkline", "values": [0.8, 0.6, 0.4, 0.3, 0.25], "color": "good"}'
requests.post(url, headers=headers, json={"status_detail": loss_chart})Access Status via API
Retrieve custom status from external services or local scripts:
import requests
import os
# Use your Valohai API token
auth_token = os.environ['VH_API_TOKEN']
headers = {'Authorization': f'Token {auth_token}'}
# Get execution details
url = 'https://app.valohai.com/api/v0/executions/{execution_id}/'
response = requests.get(url, headers=headers)
# Extract custom status
custom_status = response.json()["status_detail"]
print(f"Current status: {custom_status}")🔐 Keep your API token secure and out of version control. Use environment variables or secret management.
Best Practices
Update Strategically: Don't spam status updates. Set them at meaningful milestones or every N iterations.
Keep It Readable: Status shows in the UI and API responses. Make it human-friendly.
Handle Failures Gracefully: Wrap status updates in try/catch blocks so they don't crash your execution.
Use Rich Content Sparingly: Gauges and charts are great for long processes, but plain text is often clearer for distinct operations.
Last updated
Was this helpful?
