You can save additional information with each file you saved. The structure of the information is purely up for you to decide.
For example, if you’re reading images from a certain factory, you could save information about the factory, date and the conditions around the images.
To save additional information with each file, you’ll need to attach extra metadata to your output files by saving a .metadata.json
file alongside your data.
Store the metadata in JSON format within a *.metadata.json
file. Valohai will associate this metadata with the respective output file.
If you have lots of output files, an alternative is to store the metadata in a single JSON lines file and reference the output files from there.
The metadata file must be saved in the execution outputs directory
and called valohai.metadata.jsonl
.
Each line of the file must be a separate JSON object that contains the output file name and the metadata for that file:
{"file": "OUTPUT_FILE", "metadata": { "property": "value", "another": "and so on..." }}
Helper functions
The valohai-utils
Python package provides helper functions for handling the properties metadata file for you.
Examples
Handle properties using valohai-utils
:
with valohai.output_properties() as properties:
# create dataset version URI
new_dataset_version = properties.dataset_version_uri("dataset_name", "new_version")
# for each output file:
# write data to the file
...
# add metadata properties
properties.add(file=filename, properties={"my_property": "my_value", "number": 1.23})
# add the file to the dataset versions
properties.add_to_dataset(file=filename, dataset_version=new_dataset_version)
Code example
See example project for a complete example.
Sidecar file saved alongside the output file:
import json
metadata = {
"valohai.tags": ["prod", "lemonade"], # creates Valohai tags for the file
"valohai.alias": "model-prod", # creates or updates a Valohai data alias to point to this output file
"factory": "eu-02",
"product": "katti"
}
save_path = '/valohai/outputs/model.pkl'
model.save(save_path)
metadata_path = '/valohai/outputs/model.pkl.metadata.json'
with open(metadata_path, 'w') as outfile:
json.dump(metadata, outfile)
One metadata file for multiple output files, one line per output file:
import json
metadata = {
"model.pkl": {
"factory": "eu-02",
"valohai.tags": ["prod", "lemonade"],
"valohai.alias": "model-prod"
},
"model_2.pkl": {"factory": "eu-01"},
}
# save all the models separately; this is just an example
save_path = "/valohai/outputs/model.pkl"
model.save(save_path)
metadata_path = "/valohai/outputs/valohai.metadata.jsonl"
with open(metadata_path, "w") as outfile:
for file_name, file_metadata in metadata.items():
json.dump({"file": file_name, "metadata": file_metadata}, outfile)
outfile.write("\n")
Read data
You can access the metadata that you’ve attached to a file either through the Valohai API or during execution.
Any metadata created with datums is available during runtime under the /valohai/config/inputs.json
file.
import json
with open('/valohai/config/inputs.json') as json_file:
vh_inputs_config = json.load(json_file)
# Print metadata from each file that is in the input named "myinput"
for data in vh_inputs_config['myinput']['files']:
print(data["metadata"])