Save Files from Jobs

Save execution outputs without manual authentications and uploads.

Valohai will automatically upload files produced by the execution to the desired store, assign them datum IDs and start tracking them. This will then allow you to use these files as inputs of successive executions.

In order to make Valohai upload the files and start tracking them as datums, all you have to do is move them to the /valohai/outputs/ directory. Once the execution is completed, everything found in the directory tree starting on this path will be uploaded and tracked.

Upload file

Create a step, that will run a python script containing only the code bellow:

with open("/valohai/outputs/data.txt", "w") as f: 
    f.write("Hello Valohai")

This script will create a file on path /valohai/outputs/data.txt that will, once the execution is completed, be uploaded to the desired store. You can inspect this file by navigating to Outputs tab of the execution.

Generate output file path

Valohai provides multiple way to generate the path for your output files, without having to remember where they should be saved.

Using valohai-utils

import valohai, json 

data = {"name": "training_stat", "accuracy": 0}

output_path = valohai.outputs().path("stats.json")

with open(output_path, "w") as f: 
    json.dump(data, f)

If the value passed to the path() method contains not only the file name (as shown in the example above) but also one or more subdirectories, valohai-utils will make sure to create all those directories and preserve the directory structure once the file is uploaded.

import valohai, json 

data = {"name": "training_stat", "accuracy": 0}

output_path = valohai.outputs().path("run_1/epoch_3/stats.json")

with open(output_path, "w") as f: 
    json.dump(data, f)

Code above will produce such output file:

💡When using this file as an input to your execution, you can control if you would like to preserve this directory structure or just download all files in a single directory.

Using system environment variables

If you don't like importing unnecessary libraries, you can just use the environment variables provided by Valohai in every execution environment:

import os, json 

data = {"name": "training_stat", "accuracy": 0}

## In case this code is executed outside of the Valohai environment (e.g on your local
## machine), VH_OUTPUTS_DIR won't be populated and the outputs will be saved in the 
## outputs directory in your current directory, e.g 'current_dir/outputs/stats.json'
outputs_dir = os.getenv("VH_OUTPUTS_DIR", default="./outputs")
output_path = os.path.join(outputs_dir, "stats.json")

with open(output_path, 'w') as f: 
    json.dump(data, f)

💡Checkout the other environment variables and configuration files available in each execution.

Technology agnostic

Most of the examples on this page will be in Python but you absolutely can use whatever programming language you like. Here is an example in R :

# Get the location of Valohai outputs directory
vh_outputs_path <- Sys.getenv("VH_OUTPUTS_DIR", unset = "./outputs")

# Define a filepath in Valohai outputs directory
# e.g. /valohai/outputs/>filename.ext>
out_path <- file.path(vh_outputs_path, "mydata.csv")
write.csv(output, file = out_path)

Or just an one-liner in bash:

echo '{"name": "training_stat", "accuracy": 0}' > $VH_OUTPUTS_DIR/stats.json

Destination store

If not specified, every execution will inherit a default output store, defined on either Organization or Project level - depending under whose ownership execution was created.

Changing the upload store for an execution can be done either through UI when copying the execution:

or by specifying the store ID through valohai.yaml when defining step, using the upload-store property:

- step: 
    name: datum_uploader
    environment: ec2-instance
    image: python:3.10
    upload-store: "0199971a-c953-7bbe-407e-87a1d46e4d5e"
    command:
      - python writer.py

💡 To obtain the store ID, you can use this StoreList API endpoint.
Instructions and examples on how to use the API can be found on this page.

Live uploads

If a file, found under the /valohai/outputs directory, is marked as read-only, it will be automatically uploaded even before the execution is completed.

This allows you to inspect outputs of a long running execution, even before their completion. This would allow you, in case you are not satisfied with the results, to stop the long running execution and start another one with slightly different parameters, that would yield better results.

⚠️ After the upload is completed, these files will be deleted from the machine on which the execution is run. Be sure that you code won't have to use these files after they are created!

To mark a file as read-only, you can use standard python libraries:

import os 

os.chmod('/valohai/outputs/file_to_upload', 0o444)

or valohai-utils library:

import valohai 

valohai.outputs().live_upload("output_file.json")

or any other method, including basic bash call:

## These two are equivalent
chmod a-w /valohai/outputs/file_to_upload
chmod 444 /valohai/outputs/file_to_upload

PreviousData Management NextLoad Files in Jobs

Last updated 4 hours ago

Was this helpful?