Compare Executions

Compare metrics from multiple experiments side-by-side. See all runs together on the same graphs and in a comparison table to quickly identify your best-performing models.


Quick Start

1. Select Executions to Compare

From your project's Executions tab:

  1. Use the checkboxes at the left of each row to select executions

  2. Select 2 or more executions you want to compare

  3. Click the Compare button above the table

The comparison view opens.


2. View Comparison Graphs

All selected executions appear on the same visualization:

Time series graphs: Each execution's metrics plotted together Same controls: Horizontal axis, smoothing, vertical axes—all work the same as single-execution views Color-coded: Each execution gets a unique color (or enable "One Color per Execution" to group all metrics from one execution)


3. View Comparison Table

Scroll down below the graphs to see the comparison table.

The table shows:

  • One row per execution

  • One column per metric

  • Latest value of each metric

  • Execution metadata (name, parameters, environment, etc.)

Sort by any column to quickly find:

  • Highest accuracy

  • Lowest loss

  • Fastest training time

  • Best F1 score


Comparison Workflow

1. Run Multiple Experiments

Train models with different settings:

# Experiment 1: learning_rate=0.001
# Experiment 2: learning_rate=0.01
# Experiment 3: learning_rate=0.1

Each logs metrics the same way:

print(json.dumps({
    "epoch": epoch,
    "train_loss": train_loss,
    "val_loss": val_loss,
    "val_accuracy": val_acc
}))

2. Compare Visually

Select executions and click Compare

What you'll see:

  • All training curves overlaid on one graph

  • Which experiments converge faster

  • Which achieve better final performance

  • Which settings cause instability


3. Analyze the Table

Click column headers to sort:

Sort by val_accuracy descending → Find best model Sort by train_loss ascending → See which converged best Sort by epoch → See which finished training

Find patterns:

  • Do higher learning rates train faster but plateau lower?

  • Do larger batch sizes improve stability?

  • Does dropout improve generalization (lower gap between train/val)?


Comparison Features

Overlay Metrics on Graphs

All executions appear on the same graph with different colors:

Example: Comparing 3 learning rates

  • Blue line: LR=0.001 (slow but steady)

  • Green line: LR=0.01 (faster convergence)

  • Red line: LR=0.1 (unstable, diverges)

You can instantly see which learning rate works best.


Use "One Color per Execution"

Enable this option in Chart Options to:

  • Give all metrics from one execution the same color

  • Make it easier to track which line belongs to which execution

  • Reduce visual clutter when comparing many executions

Use when: Comparing 3+ executions with multiple metrics each


Filter with Smoothing

Apply smoothing to noisy metrics to see trends more clearly:

  1. Select a metric in Vertical Axes

  2. Adjust the Smoothing slider

  3. Compare smoothed trends across executions

Especially useful when comparing runs with batch-level logging.


Create Multiple Comparison Views

Just like single executions, you can create multiple visualization tabs:

  1. Click the + button

  2. Name your tab (e.g., "Loss Comparison", "Accuracy Only")

  3. Add different metrics to each tab

Use cases:

  • One tab for loss curves

  • One tab for accuracy metrics

  • One tab for learning rate comparison


Comparison Table

Understanding the Table

Rows: One per execution Columns: One per unique metric across all executions Values: Latest logged value of each metric

Example:

Execution
learning_rate
val_accuracy
val_loss
final_epoch

#142

0.001

0.95

0.23

100

#143

0.01

0.93

0.28

100

#144

0.1

0.78

0.65

50

Insight: Execution #142 has the best accuracy, but #143 might be acceptable and trains at similar speed.


Sort to Find Best

Click any column header to sort:

Sort by val_accuracy (descending): Immediately see which execution achieved the highest validation accuracy.

Sort by train_loss (ascending): See which execution had the best training convergence.

Sort by execution number: View in chronological order to see how recent changes affected performance.


Missing Values

If an execution didn't log a particular metric, the cell appears empty.

Example:

  • Execution #142 logs val_f1_score

  • Execution #143 does not log val_f1_score

  • Table shows value for #142, empty for #143


Common Comparison Scenarios

Hyperparameter Tuning

Goal: Find the best learning rate

Steps:

  1. Run executions with learning_rate = [0.0001, 0.001, 0.01, 0.1]

  2. Compare all executions

  3. Sort table by val_accuracy (descending)

  4. Look at graphs to see convergence speed

  5. Choose learning rate that balances accuracy and training time


Architecture Comparison

Goal: Compare ResNet50 vs. EfficientNet

Steps:

  1. Train both architectures with the same settings

  2. Log model_name as a metric or parameter

  3. Compare executions

  4. Sort by val_accuracy and training_time_minutes

  5. Evaluate tradeoff between accuracy and speed

# Log model architecture
print(json.dumps({
    "model_name": "resnet50",  # or "efficientnet"
    "epoch": epoch,
    "val_accuracy": val_acc
}))

Optimizer Comparison

Goal: SGD vs. Adam vs. AdamW

Steps:

  1. Train with each optimizer

  2. Log optimizer name

  3. Compare convergence speed and final accuracy

  4. Consider stability (look for spikes in loss)


Best Practices

Use Consistent Metric Names

Keep metric names identical across all executions:

# Good: All experiments use same names
"val_accuracy"
"val_loss"
"train_accuracy"

# Avoid: Different names per experiment
"validation_accuracy"  # Experiment 1
"val_acc"             # Experiment 2

If metrics have different names, they appear as separate columns in the comparison table.


Use Descriptive Execution Names

Name your executions descriptively in the Valohai UI:

  • Good: "ResNet50-LR0.001-BS32"

  • Avoid: "Execution #142"

Makes it easier to identify executions in the comparison view.


Compare Small Batches First

Don't compare 50 executions at once. Start small:

  1. Compare 2-5 related executions

  2. Identify patterns

  3. Drill down with more comparisons as needed

Too many executions create cluttered graphs and slow loading.


Export Comparison Data

Download the comparison table for external analysis:

  1. Click Download raw data (top right)

  2. Choose CSV or JSON

  3. Get all metrics from all selected executions

Use exported data for:

  • Statistical analysis (e.g., significance tests)

  • Custom visualizations

  • Reporting to stakeholders

  • Creating summary tables

Example: Analysis in Python

import pandas as pd
import matplotlib.pyplot as plt

# Load comparison data
df = pd.read_csv('comparison_data.csv')

# Group by hyperparameter
grouped = df.groupby('learning_rate')['val_accuracy'].mean()

# Plot
grouped.plot(kind='bar')
plt.title('Average Val Accuracy by Learning Rate')
plt.xlabel('Learning Rate')
plt.ylabel('Val Accuracy')
plt.savefig('comparison_analysis.png')

Next Steps

Last updated

Was this helpful?