If your on-premises server has more than one GPUs, it is possible set Valohai to use all of them for one job or run jobs in parallel. In the latter case each job will have access to one GPU and will take up as much memory/CPU as it needs. Note that the limitation is that you can only choose one of these options at a time.
Follow the steps below to setup multiple Valohai agents on your on-premises server. Note that you will have to first install the agent either with the Ubuntu installer or manually.
- Stop the original Valohai agent with
sudo systemctl stop peon
andsudo systemctl disable peon
- Rename the
/etc/systemd/system/peon.service
file to/etc/systemd/system/peon@.service
. - Open the service file and add a new line in the
[Service]
section:Environment='EXTRA_ENVIRONMENT_VARIABLES={"NVIDIA_VISIBLE_DEVICES": "%I"}'
You can place this for example after theEnvironmentFile=-/etc/peon.config
line. - Add also the following line in the
[Service]
section:Environment="IDENTITY=UUID.%i"
. Replace theUUID
by a UUID you have generated (you can use for example this generator). - Run
sudo systemctl daemon-reload
to read the updated service file. - Enable multiple Valohai agents by running the following commands. The example below is for a case when you have 4 GPUs. Adjust the number depending on your own server.
sudo systemctl enable --now peon@0
sudo systemctl enable --now peon@1
sudo systemctl enable --now peon@2
sudo systemctl enable --now peon@3
Remember to disable the original Valohai agent
It is important to remember to disable the original Valohai agent as otherwise there will be too many peons competing for resources. One trying to take all the GPUs, and others taking one GPU per peon.