We understand that transitioning your existing machine learning workflows to a new platform may seem daunting. However, with Valohai the process is designed to be seamless and low-effort.
Let’s walk through the steps needed to take your existing jobs and start running them on Valohai with minimal hassle.
Step 1: Define Your Dependencies
The first step in migrating your existing jobs to Valohai is to understand and define your dependencies. Identify the Python packages your code relies on, including their versions. This way you can ensure that your jobs run exactly the same way in the new environment.
You can install these dependencies at runtime (e.g. pip install -r requirements.txt
or conda install pandas=0.13.1
) or include them in a base Docker image.
Step 2: Write Your valohai.yaml (30 minutes)
Valohai uses a YAML configuration file to define and manage your machine learning experiments. To get started, create a valohai.yaml
file in your project repository. The YAML file serves as a blueprint for your experiments, allowing you to specify the necessary parameters, inputs, and outputs.
The beauty of this step is that you can start running your existing jobs on Valohai without making any changes to your codebase. Simply define the job types (=steps) in the valohai.yaml
file, and Valohai takes care of the rest.
- step:
name: train-model
image: tensorflow/tensorflow:2.6.0
command:
- pip install -r requirements.txt
- python train_model.py
Step 3: Add Parameters and Metrics (1-2 hours)
Enhance the flexibility of your experiments by adding parameters and metrics to your valohai.yaml
file. If your code already parses configuration values and parameters from the command line (e.g., using argparse), this step becomes even quicker.
- Define the parameters for each step in
valohai.yaml
- Print performance metrics with JSON (e.g.
print(json.dumps({"precision": 0.8125, "recall": 0.8667, "f1_score": 0.8387}))
)
- step:
name: train-model
image: tensorflow/tensorflow:2.6.0
command:
- python train_model.py {parameters}
parameters:
- name: iterations
type: integer
default: 10
- name: learningrate
type: float
default: 0.01
Step 4: Save Output Artefacts (30 minutes)
Valohai simplifies the versioning and storage of your output artefacts, such as models, CSV files, and more. Update your code to save these artefacts to a local directory, and Valohai will handle versioning and automatically upload them to your specified storage.
- Save artefacts to the
/valohai/outputs/
directory.
Step 5: Update Data Access (1-2 hours)
If your existing jobs involve accessing data files stored in the cloud, updating their access for Valohai is straightforward. Modify your valohai.yaml
file to point to the relevant files, and adjust your code to read them from a local directory instead of handling authentication, downloading, and caching yourself.
- step:
name: train-model
image: tensorflow/tensorflow:2.6.0
command:
- python train_model.py
parameters:
- name: iterations
type: integer
default: 10
- name: learningrate
type: float
default: 0.01
inputs:
- name: images
keep-directories: suffix
default:
- s3://mybucket/factories/images/*.png
- azure://myblobstorage/factories/images/*.png
- gs://mybucket/factories/images/*.png
For larger datasets, you may customize data caching and disk mounting options. However, the core code for data ingestion should remain the same.
Congratulations! You’ve successfully migrated your existing jobs to Valohai with minimal effort. As you explore more features and capabilities of the Valohai platform, you’ll discover how it streamlines your machine learning workflow and accelerates your experimentation process.