Migrating existing Python projects to Valohai

Jupyter Notebooks

When running Valohai executions from Jupyter Notebooks, you don’t need to setup the valohai.yaml configuration file, as it will be automatically generated by Jupyhai.

See Jupyter Notebooks with Valohai for details on running Notebooks on Valohai.

Bringing your existing projects to Valohai is straightforward:

valohai.yaml configuration

Each Valohai project has one valohai.yaml configuration path. This should be at the root of your project.

  • You can create the yaml file manually, or use the valohai-utils Python helper library to generate one.

  • valohai.yaml defines the steps , deployment endpoints and pipelines for the project.

Example of a valohai.yaml file:

- step:
    name: Train model
    image: tensorflow/tensorflow:2.1
    command: python train.py {parameters}
    inputs:
        - name: my-sample-input
          default: s3://mybucket/data/mydata.csv
    parameters:
        - name: learningrate
          type: float
          default: 0.001
  1. Create a new project and connect it to your Git repository

  2. Add a valohai.yaml configuration file to the root of your repository

  3. Inside the configuration file:
  4. Update your code to read data from VH_INPUTS_DIR and save data to VH_OUTPUTS_DIR, instead of reading/saving to a local disk or a cloud storage location.
    import os
    import pandas as pd
    
    # Directory for all downloaded input datasets
    VH_INPUTS_DIR = os.getenv('VH_INPUTS_DIR')
    # Directory where you should save all files that you want to keep
    VH_OUTPUTS_DIR = os.getenv('VH_OUTPUTS_DIR')
    
    # Load input file from the inputs directory
    # Note that Valohai creates a new folder for each defined input (e.g. my-sample-input) and saves the individual files in that folder
    # e.g. /valohai/inputs/my-sample-input/mydata.csv
    df = pd.read_csv(os.path.join(VH_INPUTS_DIR, "my-sample-input", "mydata.csv"))
    
  5. Read Valohai parameters in your code
    import argparse
    
    def parse_args():
        parser = argparse.ArgumentParser()
        parser.add_argument('--learningrate', type=float, default=0.001)
        return parser.parse_args()
    
    args = parse_args()
    
  6. Start collecting important metrics by printing Valohai Metadata
    import json
    
    print(json.dumps({
        'step': epoch,
        'accuracy': str(logs['acc']),
    }))
    
Metadata chart comparison

See also

Find a example of a valohai.yaml file in our quickstart tutorial or for a more complex example see the TensorFlow sample

🐞 Give feedback about this page