Tasks & Parallel Execution

Tasks run multiple executions in parallel with varying parameters or input data.

They're collections of related executions that Valohai queues, schedules, and manages for you. Each execution in a Task runs independently but shares a common configuration, making it easy to compare results, optimize hyperparameters, or benchmark different approaches.

Why use Tasks?

Parallelization: Run dozens or hundreds of executions simultaneously instead of sequentially. What would take days becomes hours.

Resource management: Valohai handles machine provisioning and scales down when executions complete. You define the maximum queued executions to control costs.

Result comparison: The Task view shows all execution parameters and metrics side-by-side, so you can identify winning configurations at a glance.

Core use cases

Hyperparameter optimization

Run the same training step multiple times with different hyperparameter combinations to find the best model configuration.

Example: Train a neural network with 20 different learning rates and batch sizes to identify the optimal setup.

Tasks support multiple optimization strategies:

Grid Search: Test all combinations of specified parameter values
Random Search: Sample randomly from parameter ranges (efficient for large search spaces)
Bayesian Optimization: Use algorithms like Optuna or HyperOpt to intelligently explore the parameter space

Benchmarking models and datasets

Tasks make it easy to compare different models, datasets, or code versions in parallel.

Example: Train ResNet, EfficientNet, and ViT on the same dataset by defining model_name as a parameter. Each execution trains a different model, and you can compare results in the Task view.

Example: Test your model on 100 customer datasets by parameterizing the input path: s3://bucket/data/stores/{parameter:store-id}/data/*. Each execution processes a different store's data.

See Manual Sweeps for precise control over parameter combinations.

Distributed training

For workloads that require inter-machine communication, like distributed deep learning or reinforcement learning, Valohai offers Distributed Tasks.

These are specialized Tasks where executions communicate with each other during runtime. This is different from standard Tasks, where executions run independently.

See Distributed Training for details.

How Tasks work

When you create a Task, Valohai:

Generates executions based on your parameter configuration (grid, random, manual, or Bayesian)
Queues executions respecting your maximum queued execution limit
Provisions machines and starts executions as resources become available
Scales down machines when executions complete

Each execution downloads its own inputs, saves its own outputs, and logs its own metadata. just like a standalone execution.

You can find all Task executions on your project's Executions tab, or view them grouped together on the Tasks tab.

Task types

Valohai supports four Task types:

Grid Search: Systematically test all combinations of parameter values. Best for exhaustive searches with manageable parameter spaces.

Random Search: Randomly sample from parameter ranges. Efficient when you have many parameters or large ranges.

Bayesian Optimization: Use Optuna or HyperOpt to intelligently explore the parameter space. Best for expensive evaluations (e.g., training large models).

Manual Search: Explicitly define each parameter combination. Useful when you have domain knowledge about which combinations matter.

See the how-to guides for step-by-step instructions on each type.

Next steps

PreviousCompare Images NextGrid Search

Last updated 2 months ago

Was this helpful?