Early Stopping

Early stopping automatically stops executions or Tasks when they meet specific metadata conditions.

This saves time and compute costs by stopping work that's no longer useful, like a model that's already hit your accuracy target or an experiment that's clearly failing.

How it works

You define a stopping rule based on metadata that your training code logs (e.g., accuracy, val_loss, epoch).

Valohai monitors the metadata during execution. When the condition is met, the execution stops immediately.

For Tasks, you can also configure what happens to other executions when one meets the stop condition.

Stop condition

Early stopping works for both Tasks and single executions.

Early stopping halts that execution or Task when the condition is met.

Set up early stopping

  1. Create a new execution or Task

  2. Scroll to the Runtime section

  3. Check the box Set stop condition

  4. Define your rule:

    • Metadata key: The metric to monitor (e.g., accuracy, val_loss)

    • Operator: >, <, =, >=, <=

    • Value: The threshold (e.g., 0.95)

  5. Start your execution or Task

Example: Stop when accuracy exceeds 0.95

Metadata key: accuracy Operator: > Value: 0.95

Valohai monitors the accuracy metadata. Once it exceeds 0.95, the execution stops.

Example: Stop when loss drops below threshold

Metadata key: val_loss Operator: < Value: 0.1

The execution stops as soon as validation loss falls below 0.1.

Use cases

Cost optimization: Stop training once your model hits the target metric—no need to run all planned epochs.

Fast failure: Halt executions that show poor performance early (e.g., loss is increasing instead of decreasing).

Hyperparameter search: In a Grid Search or Bayesian Task, stop the entire Task once one execution finds the optimal configuration.

Example: Stopping a Grid Search Task

You're running a Grid Search with 50 executions to find the best learning rate.

Goal: Stop the Task when any execution achieves accuracy > 0.98.

Setup:

  1. Create the Grid Search Task

  2. Set stop condition: accuracy > 0.98

  3. Choose: Stop all executions

Once one execution hits 0.98 accuracy, Valohai stops the entire Task—saving compute on the remaining executions.

Next steps

Last updated

Was this helpful?