Fetch Failed Executions

Query your failed executions programmatically to build automated retry workflows, monitoring dashboards, or failure analysis tools.

This example shows how to filter executions by status and extract failure details, perfect for building resilient ML pipelines that automatically recover from transient errors.

Prerequisites

A Valohai project with some executions (successful or failed)
Python 3.8+ and the requests library installed

Generate a Valohai API Token

Create an API token to authenticate your requests:

Click "Hi, <username>!" in the top-right corner
Go to My Profile → Authentication
Click Manage Tokens and scroll to the bottom
Click Generate New Token
Copy the token immediately — it's shown only once

Store your token as an environment variable:

# Linux/Mac
export VH_API_TOKEN=your_token_value

# Windows
set VH_API_TOKEN=your_token_value

Get Your Project ID

Find your project ID in Project Settings → General

Fetch Failed Executions

Create fetch_failed_executions.py:

import requests
import json
import os

# Authenticate with your API token
auth_token = os.environ["VH_API_TOKEN"]
headers = {"Authorization": f"Token {auth_token}"}

# Replace with your project ID
project_id = "your-project-id-here"

# Fetch only failed executions
url = f"https://app.valohai.com/api/v0/executions/?project={project_id}&status=error"
resp = requests.get(url, headers=headers)
resp.raise_for_status()

# Print the response
print("Failed Executions:\n")
print(json.dumps(resp.json(), indent=4))

Run the script:

python fetch_failed_executions.py

Understanding the Response

The API returns a paginated list of executions. Each failed execution includes:

{
  "count": 3,
  "next": null,
  "previous": null,
  "results": [
    {
      "id": "exec-id-here",
      "counter": 42,
      "status": "error",
      "step": "train-model",
      "project": "project-id-here",
      "commit": "abc123...",
      "ctime": "2025-10-29T10:30:00Z",
      "duration": 125,
      "error_text": "Connection timeout during data download",
      "urls": {
        "display": "https://app.valohai.com/p/owner/project/execution/exec-id/"
      }
    }
  ]
}

Key fields:

status: Always "error" when filtered by status=error
error_text: Human-readable failure reason
step: Which step failed
duration: How long it ran before failing (in seconds)
urls.display: Direct link to the execution in Valohai

Status Filters

Query executions by different statuses:

# All failed executions
status = error

# All successful executions
status = completed

# Manually stopped executions
status = stopped

# Currently running executions
status = started

Next Steps

Now that you can fetch failed executions, you can:

Build automatic retry logic for transient failures
Parse error_text to categorize failure types
Send alerts when critical executions fail
Generate daily failure reports

Troubleshooting

Empty results:

Verify your project ID is correct
Check that you have executions with status=error
Try removing the status filter to see all executions

Authentication errors:

Confirm VH_API_TOKEN environment variable is set
Verify your token is valid in My Profile → Authentication

Self-hosted Valohai: Replace https://app.valohai.com with your installation URL

PreviousLaunch Executions on New S3 Data NextRetry Executions After Connection Errors

Last updated 23 days ago

Was this helpful?