Collect Metrics

Print JSON from your code and Valohai automatically captures it as metrics. No logging libraries to configure, no tracking servers to set up—just print structured data and get instant tracking.

The Basics

Any JSON printed to stdout becomes a metadata entry. Valohai parses it automatically and makes it available for visualization and comparison.

💡Tip: The examples here contain mainly numeric metrics but you can also record strings if needed.

Python

import json

print(
    json.dumps(
        {
            "epoch": 1,
            "loss": 0.5,
            "accuracy": 0.82,
        },
    ),
)

Python with valohai-utils helper tool

import valohai

with valohai.metadata.logger() as logger:
    logger.log("epoch", 1)
    logger.log("loss", 0.5)
    logger.log("accuracy", 0.82)

R

library(jsonlite)

metadata <- list(
    epoch = 1,
    loss = 0.5,
    accuracy = 0.82
)

write(toJSON(metadata, auto_unbox = TRUE), stdout())

When to Log Metrics

During Training (Progressive Logging)

Log metrics after each epoch or batch to watch training progress in real-time:

import json

for epoch in range(num_epochs):
    train_loss = train_epoch(model, train_loader)
    val_loss, val_accuracy = validate(model, val_loader)

    # Log after each epoch
    print(
        json.dumps(
            {
                "epoch": epoch,
                "train_loss": train_loss,
                "val_loss": val_loss,
                "val_accuracy": val_accuracy,
            },
        ),
    )

Benefits:

Monitor convergence in real-time
Detect training issues early
Stop runs that aren't improving

After Training (Final Results)

Log summary metrics when training completes:

import json

# Training loop completes...

# Log final results
print(
    json.dumps(
        {
            "final_train_accuracy": 0.95,
            "final_val_accuracy": 0.93,
            "final_test_accuracy": 0.92,
            "best_val_loss": 0.023,
            "total_epochs": 100,
            "training_time_minutes": 145,
        },
    ),
)

Benefits:

Compare final performance across experiments
Sort executions by best result
Track high-level experiment outcomes

What to Log

Training Metrics

Track how your model learns:

print(
    json.dumps(
        {
            "epoch": epoch,
            "train_loss": train_loss,
            "train_accuracy": train_accuracy,
            "val_loss": val_loss,
            "val_accuracy": val_accuracy,
            "learning_rate": optimizer.param_groups[0]["lr"],
        },
    ),
)

Performance Metrics

Track multiple evaluation metrics:

from sklearn.metrics import precision_score, recall_score, f1_score

print(
    json.dumps(
        {
            "accuracy": accuracy,
            "precision": precision_score(y_true, y_pred, average="weighted"),
            "recall": recall_score(y_true, y_pred, average="weighted"),
            "f1_score": f1_score(y_true, y_pred, average="weighted"),
        },
    ),
)

Data Statistics

Track dataset characteristics:

print(
    json.dumps(
        {
            "train_samples": len(train_dataset),
            "val_samples": len(val_dataset),
            "test_samples": len(test_dataset),
            "num_classes": num_classes,
            "class_balance": class_distribution.tolist(),
        },
    ),
)

Using valohai-utils

The valohai-utils library provides a clean interface for logging metrics.

Install

pip install valohai-utils

Basic Usage

import valohai

# Log individual metrics
with valohai.metadata.logger() as logger:
    logger.log("epoch", 1)
    logger.log("loss", 0.5)
    logger.log("accuracy", 0.82)

In Training Loop

import valohai

with valohai.metadata.logger() as logger:
    for epoch in range(num_epochs):
        train_loss = train_epoch(model, train_loader)
        val_loss, val_acc = validate(model, val_loader)

        logger.log("epoch", epoch)
        logger.log("train_loss", train_loss)
        logger.log("val_loss", val_loss)
        logger.log("val_accuracy", val_acc)

Quick Framework Examples

PyTorch

import torch
import json


def train_epoch(model, loader, optimizer, criterion):
    model.train()
    total_loss = 0
    correct = 0
    total = 0

    for data, target in loader:
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        pred = output.argmax(dim=1)
        correct += pred.eq(target).sum().item()
        total += target.size(0)

    return total_loss / len(loader), correct / total


# Training loop
for epoch in range(num_epochs):
    train_loss, train_acc = train_epoch(model, train_loader, optimizer, criterion)
    val_loss, val_acc = validate(model, val_loader, criterion)

    print(
        json.dumps(
            {
                "epoch": epoch,
                "train_loss": train_loss,
                "train_accuracy": train_acc,
                "val_loss": val_loss,
                "val_accuracy": val_acc,
            },
        ),
    )

Scikit-learn

from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score, precision_score, recall_score
import json

# Train model
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)

# Log metrics
print(
    json.dumps(
        {
            "accuracy": accuracy_score(y_test, y_pred),
            "precision": precision_score(y_test, y_pred, average="weighted"),
            "recall": recall_score(y_test, y_pred, average="weighted"),
            "cv_scores": cross_val_score(model, X_train, y_train, cv=5).tolist(),
        },
    ),
)

XGBoost

import xgboost as xgb
import json


# Custom callback to log metrics
def log_metrics(env):
    if env.iteration % 10 == 0:  # Log every 10 iterations
        print(
            json.dumps(
                {
                    "iteration": env.iteration,
                    "train_rmse": env.evaluation_result_list[0][1],
                    "val_rmse": env.evaluation_result_list[1][1],
                },
            ),
        )


# Train with callback
model = xgb.train(
    params,
    dtrain,
    num_boost_round=1000,
    evals=[(dtrain, "train"), (dval, "val")],
    callbacks=[log_metrics],
)

Framework-Specific Guides

For detailed integration examples with popular frameworks:

PyTorch Lightning → Use lifecycle hooks like on_train_epoch_end for automatic logging

TensorFlow/Keras → Create custom callbacks that log metrics after each epoch

YOLOv5 & Output File Watchers → Watch output files and stream them to Valohai metadata

Other frameworks? The core pattern works everywhere: print JSON from your code, and Valohai captures it. Apply the same callback/hook approach shown above.

Metadata Format

Each metadata entry is captured as a JSON object with an automatic timestamp.

Example Entry

{
  "epoch": 10,
  "loss": 0.023,
  "accuracy": 0.95,
  "_time": "2024-01-15T10:30:45.123000Z"
}

The _time field is added automatically in UTC format.

Supported Data Types

Scalars (Most Common)

print(
    json.dumps(
        {
            "epoch": 10,  # int
            "loss": 0.023,  # float
            "converged": True,  # bool
            "model": "resnet50",  # string
        },
    ),
)

Lists

print(
    json.dumps(
        {
            "class_accuracies": [0.92, 0.88, 0.95, 0.90],
            "confusion_matrix": [[50, 2], [3, 45]],
        },
    ),
)

Note that you can record lists of metadata but for graph visualizations in the Valohai UI, you will need to print out the values sepaeately.

Best Practices

Use Consistent Naming

Keep metric names identical across experiments for easy comparison:

# Good: Consistent naming
"train_loss"

"val_loss"
"test_loss"

# Avoid: Inconsistent naming
"training_loss"
"validation_loss"
"testLoss"

Log Progressively

Print metrics throughout training, not just at the end:

# Good: Log each epoch
for epoch in range(100):
    loss = train_epoch()
    print(json.dumps({"epoch": epoch, "loss": loss}))

# Avoid: Only final result
# (You lose visibility into convergence)

Include Step/Epoch Counter

Always include a step or epoch counter for time-series visualization:

# Good: Includes epoch
print(
    json.dumps(
        {
            "epoch": epoch,
            "loss": loss,
        },
    ),
)

# Avoid: Missing counter
print(
    json.dumps(
        {
            "loss": loss,
        },
    ),
)

Common Issues

Metrics Not Appearing

Symptom: JSON printed but no metrics in UI

Causes & Fixes:

Invalid JSON format → Validate JSON syntax
Missing newline after JSON → Ensure print() adds newline
Buffered stdout → Flush output: print(..., flush=True)

Test your JSON:

import json

data = {"epoch": 1, "loss": 0.5}
try:
    json.dumps(data)  # Validate format
    print(json.dumps(data))
except Exception as e:
    print(f"JSON error: {e}")

Metrics Mixed with Logs

Symptom: Hard to distinguish metrics from debug logs

Solution: Print only JSON for metrics, use stderr for debug logs:

import sys
import json

# Metrics to stdout (captured by Valohai)
print(json.dumps({"epoch": 1, "loss": 0.5}))

# Debug logs to stderr (visible in logs, not captured as metrics)
print("Training epoch 1...", file=sys.stderr)

Duplicate Keys

Symptom: Metrics overwriting each other

Solution: Use unique keys or include step counter:

# Problem: Same key repeated
print(json.dumps({"loss": 0.5}))
print(json.dumps({"loss": 0.4}))  # Overwrites previous

# Solution: Include epoch
print(json.dumps({"epoch": 1, "loss": 0.5}))
print(json.dumps({"epoch": 2, "loss": 0.4}))

Where Metrics Appear

Once logged, metrics are available in:

Execution Metadata Tab

View all metrics for a single execution with interactive graphs.

Executions Table

See the latest value of each metric in the table. Sort by any column to find top performers.

Comparison View

Select multiple executions and compare their metrics side-by-side.

Export from the UI

Download metrics as CSV or JSON for custom analysis.

Export with API

You can use the /api/v0/executions/{exec-id}/metadata/ API endpoint to get the execution metadata.

Next Steps

Visualize metrics with interactive graphs
Compare executions to find your best model
Use metrics in pipeline conditions to automate decisions

PreviousExperiment Tracking & Visualizations NextPyTorch Lightning

Last updated 25 days ago

Was this helpful?