Experiment Tracking & Visualizations
Track every metric, visualize training progress, and compare experiments without writing custom logging code. Valohai's metadata system turns any JSON you print into searchable, sortable, comparable experiment data.
No MLflow setup, no TensorBoard configuration, no database management, just print JSON and get instant visualizations.
How It Works
Print JSON, Get Metrics
Any JSON your code prints becomes metadata:
import json
print(json.dumps({
"epoch": 10,
"loss": 0.023,
"accuracy": 0.95,
"learning_rate": 0.001
}))That's it. Valohai captures it automatically.
💡Tip: In Python you can use for example
json.dumps()or thevalohai-utilshelper tool to print the metrics. See the Collect Metrics section for more information.
Visualize in Real-Time
Watch your metrics update as training runs. No need to wait for the job to finish—graphs appear as soon as the first JSON is printed.
Compare Across Runs
Select multiple executions and compare their metrics side-by-side. Sort by accuracy, filter by loss, find your best model in seconds.
What You Can Track
Training Metrics
Log loss, accuracy, precision, recall—anything you can measure:
with valohai.metadata.logger() as logger:
for epoch in range(epochs):
# Training loop...
logger.log("epoch", epoch)
logger.log("train_loss", train_loss)
logger.log("val_loss", val_loss)
logger.log("accuracy", accuracy)Custom Metrics
Log anything relevant to your experiment:
print(json.dumps({
"training_time_minutes": 145,
"gpu_utilization": 0.92,
"dataset_size": 10000,
"model_parameters": 25000000
}))💡Tip: Metrics are not limited to numeric values only but you can log anything you can print from your jobs.
Visualizations
Time Series (Default)
Plot metrics over time: epochs, steps, or timestamps. Watch loss decrease and accuracy improve as training progresses.
Perfect for:
Monitoring convergence
Detecting overfitting
Spotting training instability

Confusion Matrices
Visualize classification performance with interactive confusion matrices. See where your model excels and where it struggles.
Perfect for:
Multi-class classification
Error analysis
Model debugging

Image Comparison
Stack output images from different runs and toggle between them. Use blend modes, side-by-side sliders, and color overlays to spot differences.
Perfect for:
Computer vision experiments
Quality control testing
Before/after comparisons
Comparing Experiments
Side-by-Side Comparison
Select multiple executions and view their metrics in a comparison table. Sort by any metric to find your best performer.
Use cases:
Hyperparameter tuning: which learning rate worked best?
Architecture comparison: ResNet vs. EfficientNet
Data: how does training data size affect accuracy?

Sortable Execution Table
The Executions table displays the latest value of each metric. Click any column header to sort by that metric.
Find:
Highest accuracy runs
Fastest training times
Most efficient models (accuracy per parameter)
Download for Analysis
Export metadata as CSV or JSON for deeper analysis in pandas, Excel, or your tool of choice.
Why This Matters
No Instrumentation Overhead
With other tools:
# Set up MLflow
import mlflow
mlflow.set_tracking_uri("...")
mlflow.start_run()
mlflow.log_param("lr", 0.001)
mlflow.log_metric("loss", loss, step=epoch)
mlflow.end_run()With Valohai:
# Just print JSON
print(json.dumps({"epoch": epoch, "loss": loss}))Automatic Versioning
Every execution's metrics are linked to:
The exact code (Git commit)
The input data
The hyperparameters used
The output artifacts produced
No manual tracking, no forgotten runs, no "which model was this again?"
Built for ML Workflows
Metrics aren't isolated—they're connected to your entire ML pipeline:
Sort executions by accuracy to pick the best for deployment
Use metric thresholds in pipelines to gate production releases
Compare image outputs from different preprocessing strategies
Common Patterns
Monitor Training Progress
import valohai
with valohai.metadata.logger() as logger:
for epoch in range(epochs):
train_loss = train_epoch(model, train_loader)
val_loss = validate(model, val_loader)
logger.log("epoch", epoch)
logger.log("train_loss", train_loss)
logger.log("val_loss", val_loss)
logger.log("learning_rate", optimizer.param_groups[0]['lr'])Track Experiment Results
import json
# After training completes
results = {
"final_accuracy": 0.95,
"best_val_loss": 0.023,
"epochs_trained": 100,
"early_stopped": True,
"training_time_minutes": 145
}
print(json.dumps(results))Log Confusion Matrix
from sklearn.metrics import confusion_matrix
import numpy as np
import json
y_actu = [2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2]
y_pred = [0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2]
matrix = confusion_matrix(y_actu, y_pred)
# Convert to list
result = matrix.tolist()
print(json.dumps({"data": result}))
# {"data": [[3, 0, 0], [0, 1, 2], [2, 1, 3]]}
Best Practices
Log Incrementally
Print metrics throughout training, not just at the end:
# Good: Log each epoch
for epoch in range(100):
loss = train_epoch()
print(json.dumps({"epoch": epoch, "loss": loss}))
# Avoid: Only log final result
# (You lose visibility into training progress)Use Consistent Keys
Keep metric names consistent across experiments:
# Good: Consistent naming
"accuracy"
"val_accuracy"
"test_accuracy"
# Avoid: Inconsistent naming
"acc"
"validation_accuracy"
"test_acc"Next Steps
Get Started:
Collect metrics from your training code
Visualize metrics in real-time
Compare executions to find your best model
Advanced:
Use metrics in pipeline conditions to automate decisions
Set up early stopping based on metric thresholds
Last updated
Was this helpful?
