Experiment Tracking & Visualizations

Track every metric, visualize training progress, and compare experiments without writing custom logging code. Valohai's metadata system turns any JSON you print into searchable, sortable, comparable experiment data.

No MLflow setup, no TensorBoard configuration, no database management, just print JSON and get instant visualizations.


How It Works

Any JSON your code prints becomes metadata:

import json

print(
    json.dumps(
        {
            "epoch": 10,
            "loss": 0.023,
            "accuracy": 0.95,
            "learning_rate": 0.001,
        },
    ),
)

That's it. Valohai captures it automatically.

💡Tip: In Python you can use for example json.dumps() or the valohai-utils helper tool to print the metrics. See the Collect Metrics section for more information.

Visualize in Real-Time

Watch your metrics update as training runs. No need to wait for the job to finish—graphs appear as soon as the first JSON is printed.

Compare Across Runs

Select multiple executions and compare their metrics side-by-side. Sort by accuracy, filter by loss, find your best model in seconds.


What You Can Track

Training Metrics

Log loss, accuracy, precision, recall—anything you can measure:

Custom Metrics

Log anything relevant to your experiment:

💡Tip: Metrics are not limited to numeric values only but you can log anything you can print from your jobs.


Visualizations

Time Series (Default)

Plot metrics over time: epochs, steps, or timestamps. Watch loss decrease and accuracy improve as training progresses.

Perfect for:

  • Monitoring convergence

  • Detecting overfitting

  • Spotting training instability

Learn more →

Confusion Matrices

Visualize classification performance with interactive confusion matrices. See where your model excels and where it struggles.

Perfect for:

  • Multi-class classification

  • Error analysis

  • Model debugging

Learn more →

Image Comparison

Stack output images from different runs and toggle between them. Use blend modes, side-by-side sliders, and color overlays to spot differences.

Perfect for:

  • Computer vision experiments

  • Quality control testing

  • Before/after comparisons

Learn more →

Custom Plots and Images

If you need specific types of plots for your metadata. You can always do the plotting inside your executions and save the results as outputs.


Comparing Experiments

Side-by-Side Comparison

Select multiple executions and view their metrics in a comparison table. Sort by any metric to find your best performer.

Use cases:

  • Hyperparameter tuning: which learning rate worked best?

  • Architecture comparison: ResNet vs. EfficientNet

  • Data: how does training data size affect accuracy?

Learn more →

Sortable Execution Table

The Executions table displays the latest value of each metric. Click any column header to sort by that metric.

Find:

  • Highest accuracy runs

  • Fastest training times

  • Most efficient models (accuracy per parameter)

Download for Analysis

Export metadata as CSV or JSON for deeper analysis in pandas, Excel, or your tool of choice.


Why This Matters

No Instrumentation Overhead

With other tools:

With Valohai:

Automatic Versioning

Every execution's metrics are linked to:

  • The exact code (Git commit)

  • The input data

  • The hyperparameters used

  • The output artifacts produced

No manual tracking, no forgotten runs, no "which model was this again?"

Built for ML Workflows

Metrics aren't isolated—they're connected to your entire ML pipeline:

  • Sort executions by accuracy to pick the best for deployment

  • Use metric thresholds in pipelines to gate production releases

  • Compare image outputs from different preprocessing strategies


Common Patterns

Monitor Training Progress

Track Experiment Results

Log Confusion Matrix


Best Practices

Log Incrementally

Print metrics throughout training, not just at the end:

Use Consistent Keys

Keep metric names consistent across experiments:


Next Steps

Get Started:

  1. Collect metrics from your training code

  2. Visualize metrics in real-time

  3. Compare executions to find your best model

Advanced:

Last updated

Was this helpful?