Collect and view metrics

Note

This tutorial is a part of our Valohai Fundamentals series.

Valohai allows you to easily collect metadata, such as key performance metrics from executions, visualize it and compare it across multiple executions.

In this section you will learn:

  • How to collect metadata

  • How to visualize metadata in the UI

  • How to compare metadata between executions

A short introduction to metadata

  • Valohai metadata is collected as key:value pairs

  • Easily visualized as a time series graph or a scatter plot in the web app

  • Used to compare performance of multiple executions

  • Sort and find executions based on metadata metrics

Update train.py to add metadata logging:

  • Create a new function log_metadata that will log metadata

  • Create a TensorFlow LambdaCallback to trigger the log_metadata function every time an epoch ends

  • Pass the new callback to the model.fit method

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import numpy as np
import tensorflow as tf
import valohai


def log_metadata(epoch, logs):
    with valohai.logger() as logger:
        logger.log('epoch', epoch)
        logger.log('accuracy', logs['accuracy'])
        logger.log('loss', logs['loss'])


valohai.prepare(
    step='train-model',
    image='tensorflow/tensorflow:2.6.0',
    default_inputs={
        'dataset': 's3://onboard-sample/tf-sample/mnist.npz'
    },
    default_parameters={
        'learning_rate': 0.001,
        'epochs': 5,
    },
)

input_path = valohai.inputs('dataset').path()
with np.load(input_path, allow_pickle=True) as f:
    x_train, y_train = f['x_train'], f['y_train']
    x_test, y_test = f['x_test'], f['y_test']

x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])

optimizer = tf.keras.optimizers.Adam(learning_rate=valohai.parameters('learning_rate').value)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer,
            loss=loss_fn,
            metrics=['accuracy'])

callback = tf.keras.callbacks.LambdaCallback(on_epoch_end=log_metadata)
model.fit(x_train, y_train, epochs=valohai.parameters('epoch').value, callbacks=[callback])

model.evaluate(x_test,  y_test, verbose=2)

output_path = valohai.outputs().path('model.h5')
model.save(output_path)

Collect test metrics

  • Save the model test accuracy and test loss into variables

  • Log the test metrics with the Valohai logger

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import numpy as np
import tensorflow as tf
import valohai


def log_metadata(epoch, logs):
    with valohai.logger() as logger:
        logger.log('epoch', epoch)
        logger.log('accuracy', logs['accuracy'])
        logger.log('loss', logs['loss'])


valohai.prepare(
    step='train-model',
    image='tensorflow/tensorflow:2.6.0',
    default_inputs={
        'dataset': 'https://valohaidemo.blob.core.windows.net/mnist/mnist.npz'
    },
    default_parameters={
        'learning_rate': 0.001,
        'epochs': 5,
    },
)

input_path = valohai.inputs('dataset').path()
with np.load(input_path, allow_pickle=True) as f:
    x_train, y_train = f['x_train'], f['y_train']
    x_test, y_test = f['x_test'], f['y_test']

x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])

optimizer = tf.keras.optimizers.Adam(learning_rate=valohai.parameters('learning_rate').value)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer,
            loss=loss_fn,
            metrics=['accuracy'])

callback = tf.keras.callbacks.LambdaCallback(on_epoch_end=log_metadata)
model.fit(x_train, y_train, epochs=valohai.parameters('epoch').value, callbacks=[callback])

test_loss, test_accuracy = model.evaluate(x_test,  y_test, verbose=2)
with valohai.logger() as logger:
    logger.log('test_accuracy', test_accuracy)
    logger.log('test_loss', test_loss)

output_path = valohai.outputs().path('model.h5')
model.save(output_path)

Run in Valohai

Adding or changing metadata doesn’t require any changes to the valohai.yaml - Config File.

You can immediately launch a new execution and view the collected metadata.

vh exec run train-model --adhoc

View metrics

  • Go to your project’s executions

  • Click on the Show columns button on the right side, above the table

  • Select accuracy and loss to show them in the table.

  • Open the latest execution

  • Go to the metadata tab to view metrics from that executions.

  • Select epoch on X-axis and accuracy and loss on Y-axis

Latest metada value

The metadata value displayed in the table is always the latest printed metadata. In your script you should ensure that the last value you print out for accuracy is the best value for your use case.