Upload output data¶
This tutorial is a part of our Valohai fundamentals series.
During execution the outputs are stored in
/valohai/outputs directory. After the execution finishes, they will be automatically uploaded to the user configured data store. This will happen regardless of whether the execution completed as intended, stopped or crashed.
In this section you will learn:
What is a datum identifier
Where do you find datums
How to set aliases for datums
A short introduction outputs
Any file(s) that you want to save, version, track, and access after the execution should be saved as Valohai outputs.
Valohai will upload all files to your private cloud storage and version those files.
Each output will be availble under the executions outputs tab and in the project’s data tab. From there you can download the file, or copy the link to that file.
When creating another execution you can pass in the
datum://address of an output file, or use a cloud specific address (i.e.
Let’s update the
output_path to a Valohai output path in our sample script file.
We’ll use valohai-utils to define the output directory, so make sure you’ve imported valohai to your project.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
import numpy as np import tensorflow as tf import valohai valohai.prepare( step='train-model', image='tensorflow/tensorflow:2.6.0', ) input_path = 'mnist.npz' with np.load(input_path, allow_pickle=True) as f: x_train, y_train = f['x_train'], f['y_train'] x_test, y_test = f['x_test'], f['y_test'] x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation='softmax') ]) loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy']) model.fit(x_train, y_train, epochs=5) model.evaluate(x_test, y_test, verbose=2) output_path = valohai.outputs().path('model.h5') model.save(output_path)
Now create a new file
requirements.txt and add there valohai-utils.
Finally, update your
valohai.yaml to first install the requirements and then run your Python script.
- step: name: train-model command: - pip install -r requirements.txt - python train.py image: tensorflow/tensorflow:2.6.0
Datums are unique identifiers that can be used to point to specific output files. You can use them as inputs in your executions in order to reuse the output data. You can view and copy datums from the web UI.
Open your project on app.valohai.com
Go to the Data tab under your project
Click the three dots at the end of the row for the execution you
Click Copy datum:// URL
You’ll also have the option to copy your cloud data store’s URL (e.g.
azure://. You can use either the datum URL or the cloud provider URL for your Valohai executions.
The advantage of using
datum:// is that it allows Valohai keep track of that exact file and version. This allows you to later on trace back files and understand where different files are used, or for example to keep track of which pipeline was ran to generate a trained model file.
Setting datum aliases¶
In some cases you might want to set an alias that for example always points to the latest execution and its datum.
Open your project on app.valohai.com
Go to the Project Data view (Data tab under your project)
Choose Aliases tab
Click Create new datum alias
Write Name for the alias and choose datum from the list.
You can edit saved aliases by choosing Edit from the Actions dropdown menu. The change history of aliases is tracked.