# Compare Executions

Compare metrics from multiple experiments side-by-side. See all runs together on the same graphs and in a comparison table to quickly identify your best-performing models.

***

### Quick Start

#### 1. Select Executions to Compare

From your project's **Executions** tab:

1. Use the checkboxes at the left of each row to select executions
2. Select 2 or more executions you want to compare
3. Click the **Compare** button above the table

<figure><img src="/files/mhEfjgrOg9zkGzy2KTA8" alt=""><figcaption></figcaption></figure>

Once you enter the **Compare Executions** view, a sidebar appears on the left.

This sidebar lets you control exactly which executions are included in the comparison.

You can:

* Add executions to the comparison
* Remove executions
* Search and filter runs
* Quickly switch between different combinations

You don’t need to go back to the main Executions tab, everything can be managed directly from this view.

Any changes you make update the graphs instantly, while keeping your current visualization settings (selected view, axes, filters, etc.).

<figure><img src="/files/3MMeXih1VAPLCf7ghiNy" alt=""><figcaption></figcaption></figure>

***

#### 2. View Comparison Graphs

All selected executions appear together in the same visualization.

Depending on the selected view:

* [**Time Series**](/experiment-tracking/visualize-metrics/time-series.md): Metrics from all executions are plotted over time (epochs, steps, or iterations).
* [**Scatter Plot**](/experiment-tracking/visualize-metrics/scatter-plot.md): Executions appear as points, allowing comparison across two or three numeric metrics.
* [**Grouped Plot**](/experiment-tracking/visualize-metrics/grouped-metadata-plot.md): Executions are grouped by categorical metadata and displayed as box plots with statistical summaries.
* [**Confusion Matrix**](/experiment-tracking/visualize-metrics/confusion-matrix.md): Compare classification performance across executions to analyze differences in prediction behaviour.
* [**Image Comparison**](/experiment-tracking/compare-images.md): Visually inspect and compare output images from multiple runs side-by-side.

***

#### 3. View Comparison Table

Scroll down below the graphs to see the **comparison table**.

The table shows:

* One row per execution
* One column per metric
* Latest value of each metric
* Execution metadata (name, parameters, environment, etc.)

**Sort by any column** to quickly find:

* Highest accuracy
* Lowest loss
* Fastest training time
* Best F1 score

***

### Comparison Workflow

#### 1. Run Multiple Experiments

Train models with different settings:

```python
# Experiment 1: learning_rate=0.001
# Experiment 2: learning_rate=0.01
# Experiment 3: learning_rate=0.1
```

Each logs metrics the same way:

```python
print(
    json.dumps(
        {
            "epoch": epoch,
            "train_loss": train_loss,
            "val_loss": val_loss,
            "val_accuracy": val_acc,
        },
    ),
)
```

***

#### 2. Compare Visually

**Select executions and click Compare**

**What you'll see:**

* All training curves overlaid on one graph
* Which experiments converge faster
* Which achieve better final performance
* Which settings cause instability

***

#### 3. Analyze the Table

**Click column headers to sort:**

Sort by `val_accuracy` descending → Find best model\
Sort by `train_loss` ascending → See which converged best\
Sort by `epoch` → See which finished training

**Find patterns:**

* Do higher learning rates train faster but plateau lower?
* Do larger batch sizes improve stability?
* Does dropout improve generalization (lower gap between train/val)?

***

### Comparison Features

#### Overlay Metrics on Graphs

All executions appear on the same graph with different colors:

**Example:** Comparing 3 learning rates

* Blue line: LR=0.001 (slow but steady)
* Green line: LR=0.01 (faster convergence)
* Red line: LR=0.1 (unstable, diverges)

You can instantly see which learning rate works best.

***

#### Use "One Color per Execution"

Enable this option in **Chart Options** to:

* Give all metrics from one execution the same color
* Make it easier to track which line belongs to which execution
* Reduce visual clutter when comparing many executions

**Use when:** Comparing 3+ executions with multiple metrics each

***

#### Filter with Smoothing

Apply smoothing to noisy metrics to see trends more clearly:

1. Select a metric in **Vertical Axes**
2. Adjust the **Smoothing** slider
3. Compare smoothed trends across executions

Especially useful when comparing runs with batch-level logging.

***

#### Create Multiple Comparison Views

Just like single executions, you can create multiple visualization tabs:

1. Click the **+** button
2. Name your tab (e.g., "Loss Comparison", "Accuracy Only")
3. Add different metrics to each tab

**Use cases:**

* One tab for loss curves
* One tab for accuracy metrics
* One tab for learning rate comparison

***

### Comparison Table

#### Understanding the Table

**Rows:** One per execution\
**Columns:** One per unique metric across all executions\
**Values:** Latest logged value of each metric

**Example:**

| Execution | learning\_rate | val\_accuracy | val\_loss | final\_epoch |
| --------- | -------------- | ------------- | --------- | ------------ |
| #142      | 0.001          | 0.95          | 0.23      | 100          |
| #143      | 0.01           | 0.93          | 0.28      | 100          |
| #144      | 0.1            | 0.78          | 0.65      | 50           |

**Insight:** Execution #142 has the best accuracy, but #143 might be acceptable and trains at similar speed.

***

#### Sort to Find Best

Click any column header to sort:

**Sort by val\_accuracy (descending):**\
Immediately see which execution achieved the highest validation accuracy.

**Sort by train\_loss (ascending):**\
See which execution had the best training convergence.

**Sort by execution number:**\
View in chronological order to see how recent changes affected performance.

***

#### Missing Values

If an execution didn't log a particular metric, the cell appears empty.

**Example:**

* Execution #142 logs `val_f1_score`
* Execution #143 does not log `val_f1_score`
* Table shows value for #142, empty for #143

***

### Common Comparison Scenarios

#### Hyperparameter Tuning

**Goal:** Find the best learning rate

**Steps:**

1. Run executions with learning\_rate = \[0.0001, 0.001, 0.01, 0.1]
2. Compare all executions
3. Sort table by `val_accuracy` (descending)
4. Look at graphs to see convergence speed
5. Choose learning rate that balances accuracy and training time

***

#### Architecture Comparison

**Goal:** Compare ResNet50 vs. EfficientNet

**Steps:**

1. Train both architectures with the same settings
2. Log `model_name` as a metric or parameter
3. Compare executions
4. Sort by `val_accuracy` and `training_time_minutes`
5. Evaluate tradeoff between accuracy and speed

```python
# Log model architecture
print(
    json.dumps(
        {
            "model_name": "resnet50",  # or "efficientnet"
            "epoch": epoch,
            "val_accuracy": val_acc,
        },
    ),
)
```

***

#### Optimizer Comparison

**Goal:** SGD vs. Adam vs. AdamW

**Steps:**

1. Train with each optimizer
2. Log optimizer name
3. Compare convergence speed and final accuracy
4. Consider stability (look for spikes in loss)

***

### Best Practices

#### Use Consistent Metric Names

Keep metric names identical across all executions:

```python
# Good: All experiments use same names
"val_accuracy"

"val_loss"
"train_accuracy"

# Avoid: Different names per experiment
"validation_accuracy"  # Experiment 1
"val_acc"  # Experiment 2
```

If metrics have different names, they appear as separate columns in the comparison table.

***

#### Use Descriptive Execution Names

Name your executions descriptively in the Valohai UI:

* **Good:** "ResNet50-LR0.001-BS32"
* **Avoid:** "Execution #142"

Makes it easier to identify executions in the comparison view.

***

#### Compare Small Batches First

Don't compare 50 executions at once. Start small:

1. Compare 2-5 related executions
2. Identify patterns
3. Drill down with more comparisons as needed

Too many executions create cluttered graphs and slow loading.

***

### Export Comparison Data

Download the comparison table for external analysis:

1. Click **Download raw data** (top right)
2. Choose **CSV** or **JSON**
3. Get all metrics from all selected executions

**Use exported data for:**

* Statistical analysis (e.g., significance tests)
* Custom visualizations
* Reporting to stakeholders
* Creating summary tables

**Example: Analysis in Python**

```python
import pandas as pd
import matplotlib.pyplot as plt

# Load comparison data
df = pd.read_csv("comparison_data.csv")

# Group by hyperparameter
grouped = df.groupby("learning_rate")["val_accuracy"].mean()

# Plot
grouped.plot(kind="bar")
plt.title("Average Val Accuracy by Learning Rate")
plt.xlabel("Learning Rate")
plt.ylabel("Val Accuracy")
plt.savefig("comparison_analysis.png")
```

***

### Next Steps

* [Visualize metrics](/experiment-tracking/visualize-metrics.md) for individual executions
* [Create confusion matrices](/experiment-tracking/visualize-metrics/confusion-matrix.md) to compare classification performance
* [Compare output images](/experiment-tracking/compare-images.md) across different runs
* Use comparison results to inform your next experiments
* Back to [Experiment Tracking overview](/experiment-tracking.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/experiment-tracking/compare-executions.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
