Collect and view metrics

Note

This tutorial is a part of our Valohai Fundamentals series.

Valohai allows you to easily collect metadata, such as key performance metrics from executions, visualize it and compare it across multiple executions.

In this section you will learn:

  • How to collect metadata

  • How to visualize metadata in the UI

  • How to compare metadata between executions

A short introduction to metadata

  • Valohai metadata is collected as key:value pairs

  • Easily visualized as a time series graph or a scatter plot in the web app

  • Used to compare performance of multiple executions

  • Sort and find executions based on metadata metrics

Update train.py to add metadata logging:

  • Create a new method logMetadata that will use log metadata

  • Create a TensorFlow LambdaCallback to trigger logMetadata every time an epoch ends

  • Pass the new callback to model.fit

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import tensorflow as tf
import numpy
import valohai

def logMetadata(epoch, logs):
    with valohai.metadata.logger() as logger:
            logger.log("epoch", epoch)
            logger.log("accuracy", logs['accuracy'])
            logger.log("loss", logs['loss'])

my_parameters = {
    'epoch': 5
}

my_inputs = {
    'mnist': 's3://onboard-sample/tf-sample/mnist.npz'
}

valohai.prepare(step="train-model", image='tensorflow/tensorflow:2.4.1', default_parameters=my_parameters, default_inputs=my_inputs)

mnist_file_path = valohai.inputs('mnist').path()

with numpy.load(mnist_file_path, allow_pickle=True) as f:
    x_train, y_train = f['x_train'], f['y_train']
    x_test, y_test = f['x_test'], f['y_test']

x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])

predictions = model(x_train[:1]).numpy()
predictions

tf.nn.softmax(predictions).numpy()

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

loss_fn(y_train[:1], predictions).numpy()

model.compile(optimizer='adam',
            loss=loss_fn,
            metrics=['accuracy'])

metadataCallback = tf.keras.callbacks.LambdaCallback(on_epoch_end=logMetadata)
model.fit(x_train, y_train, epochs=valohai.parameters('epoch').value, callbacks=[metadataCallback])

save_path = valohai.outputs().path('model.h5')
model.save(save_path)

Run in Valohai

Adding or changing metadata doesn’t require any changes to the valohai.yaml - Config File.

You can immediately launch a new execution and view the collected metadata.

vh exec run train-model --adhoc

View metrics

  • Go to your project’s executions

  • Click on the Show columns button on the right side, above the table

  • Select accuracy and loss to show them in the table.

  • Open the latest execution

  • Go to the metadata tab to view metrics from that executions.

  • Select epoch on X-axis and accuracy and loss on Y-axis

Latest metada value

The metadata value displayed in the table is always the latest printed metadata. In your script you should ensure that the last value you print out for accuracy is the best value for your use case.