Define your steps in Python

Note

This tutorial is a part of our Valohai fundamentals series.

So far been manually writing the valohai.yaml file to define our steps. Alternatively you can use the valohai-utils Toolkit to create and update your valohai.yaml configuration file.

Update train.py to call valohai.prepare and specify your step’s details.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import numpy as np
import tensorflow as tf
import valohai

valohai.prepare(
    step='train-model',
    image='tensorflow/tensorflow:2.6.0',
    default_inputs={
        'dataset': 'https://valohaidemo.blob.core.windows.net/mnist/mnist.npz'
    },
    default_parameters={
        'learning_rate': 0.001,
        'epoch': 10,
    },
)

def log_metadata(epoch, logs):
    with valohai.logger() as logger:
        logger.log('epoch', epoch)
        logger.log('accuracy', logs['accuracy'])
        logger.log('loss', logs['loss'])

input_path = valohai.inputs('dataset').path()
with np.load(input_path, allow_pickle=True) as f:
    x_train, y_train = f['x_train'], f['y_train']
    x_test, y_test = f['x_test'], f['y_test']

x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])

optimizer = tf.keras.optimizers.Adam(learning_rate=valohai.parameters('learning_rate').value)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer,
            loss=loss_fn,
            metrics=['accuracy'])

callback = tf.keras.callbacks.LambdaCallback(on_epoch_end=log_metadata)
model.fit(x_train, y_train, epochs=valohai.parameters('epoch').value, callbacks=[callback])

model.evaluate(x_test,  y_test, verbose=2)

output_path = valohai.outputs().path('model.h5')
model.save(output_path)

Update valohai.yaml

You’ll need to update the valohai.yaml file before running your job.

On your own computer run vh yaml step <filename> to update the file.

vh yaml step train.py

Now you can run a new execution:

vh exec run train-model --adhoc