Setup Requirements
Using distributed tasks requires:
- A private worker or full private installation of Valohai
- The infrastructure security should freely allow communication between workers in the subnet
- Then distributed features can be enabled by Valohai staff
Prepare Your Project Work Distributedly
Running your steps as distributed tasks doesn’t automatically turn your code to work in a distributed way. You should prepare your project to support the distributed mindset by following the conventions provided by the frameworks you use and/or checking out our distributed examples.
The publicly available valohai-utils
Python package contain distributed tooling for that.
To get started, you might want to split your initialization code something on the lines:
import valohai
if valohai.distributed.is_distributed_task():
distributed_init()
else:
normal_init()
The rest of the preparation largely depends on the tools used, but somewhere down the line you probably want to get connection information of all workers in the distributed task.
members = valohai.distributed.members()
member_local_ips = ",".join([m.primary_local_ip for m in members])
Or, alternatively, connection information for one of the workers that the rest use to communicate through.
master = valohai.distributed.master()
master.primary_local_ip
Some distributed tool chains also require you to run a separate program to wrap the job code like OpenMPI. In these cases, we recommend wrapping that call to a Python script to better control the parameters inserted. Here is one such example.
Distributed Examples
If you want further direction or tips regarding specific distributed frameworks or technologies, be sure to check out our distributed examples.
Run a Distributed Task
After your code supports distributed processing, you can create a distributed task:
- Click on your Project’s Task tab
- Click the Create task button
- Select the Source and Runtime values you want to use
- Under Configuration, select Distributed as your Task type
- Edit the Execution count below to define how many total executions will be in the task
- Optionally use Environment variables to customize network configuration:
VH_DOCKER_NETWORK=host
can be used to expose all ports to the worker subnetVH_EXPOSE_PORTS=1234
can be used to expose certain ports or port ranges- These values can also be predefined in the
valohai.yaml
throughstep.environment-variables
like most of our distributed examples do.