Upload output data

Note

This tutorial is a part of our Valohai fundamentals series.

During execution the outputs are stored in /valohai/outputs directory. After the execution finishes, they will be automatically uploaded to the user configured data store. This will happen regardless of whether the execution completed as intended, stopped or crashed.

In this section you will learn:

  • What is a datum identifier

  • Where do you find datums

  • How to set aliases for datums

A short introduction outputs

  • Any file(s) that you want to save, version, track, and access after the execution should be saved as Valohai outputs.

  • Valohai will upload all files to your private cloud storage and version those files.

  • Each output will be availble under the executions outputs tab and in the project’s data tab. From there you can download the file, or copy the link to that file.

  • When creating another execution you can pass in the datum:// address of an output file, or use a cloud specific address (i.e. s3://, gs://, azure://)

Let’s get the path to the Valohai outputs folder from the environment variable VH_OUTPUTS_DIR` and update the save_path to save our model in that folder.

Note

If there is no environment varialbe (= you’re running locally) the path will be .outputs/

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import tensorflow as tf
import numpy

VH_OUTPUTS_DIR = os.getenv('VH_OUTPUTS_DIR', '.outputs/')

mnist = tf.keras.datasets.mnist

mnist_file_path = 'mnist.npz'

with numpy.load(mnist_file_path, allow_pickle=True) as f:
    x_train, y_train = f['x_train'], f['y_train']
    x_test, y_test = f['x_test'], f['y_test']

x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])

predictions = model(x_train[:1]).numpy()
predictions

tf.nn.softmax(predictions).numpy()

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

loss_fn(y_train[:1], predictions).numpy()

model.compile(optimizer='adam',
            loss=loss_fn,
            metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)

save_path = os.path.join(VH_OUTPUTS_DIR, 'model.h5')
model.save(save_path)

Datums

Datums are unique identifiers that can be used to point to specific output files. You can use them as inputs in your executions in order to reuse the output data. You can view and copy datums from the web UI.

  • Open your project on app.valohai.com

  • Go to the Data tab under your project

  • Click the three dots at the end of the row for the execution you

  • Click Copy datum:// URL

Note

You’ll also have the option to copy your cloud data store’s URL (e.g. s3://, gs://, or azure://. You can use either the datum URL or the cloud provider URL for your Valohai executions.

The advantage of using datum:// is that it allows Valohai keep track of that exact file and version. This allows you to later on trace back files and understand where different files are used, or for example to keep track of which pipeline was ran to generate a trained model file.

Setting datum aliases

In some cases you might want to set an alias that for example always points to the latest execution and its datum.

  • Open your project on app.valohai.com

  • Go to the Project Data view (Data tab under your project)

  • Choose Aliases tab

  • Click Create new datum alias

  • Write Name for the alias and choose datum from the list.

  • Click Save

  • You can edit saved aliases by choosing Edit from the Actions dropdown menu. The change history of aliases is tracked.

Next: Use parameters