Data aliases in Valohai are essentially shortcuts to Valohai datum URLs.
Creating a data alias in your project simplifies execution input management.
Common Use Cases:
- Create an alias like
model-prod
to direct your production batch inference to the correct model version. - Set up an alias for
train-images
to access the latest preprocessed dataset for a specific use case. This streamlines input selection for your team.
Change Tracking
Valohai keeps a detailed history of every alias change, helping you track when the latest updates occurred.
Reproducability
Data aliases are resolved when creating an execution or task, ensuring consistent data usage. Even if you copy executions, the data remains unchanged if the alias has been modified in the meantime.
Create and update an alias
You can create or update an existing alias using the web UI, programmatically when you create a new output or using the Valohai APIs.
Web application
- Open your project.
- Navigate to the Data tab.
- Go to the Aliases tab within the Data tab.
- Click on “Create new datum alias.”
- Provide a name for the alias.
- Select the specific file to which the alias should point.
- Additionally, on this page, you can modify existing aliases and access the change history for each datum alias.
Programatically
You also have the option to automatically generate or modify an existing alias when you save a file within your executions by saving an additional JSON file *.metadata.json
for the file. Once you’ve stored an alias using metadata, it will become visible on the Data -> Alias tab.
import valohai
import json
metadata = {
"valohai.alias": "model-prod", # creates or updates a Valohai data alias to point to this output file
}
save_path = '/valohai/outputs/model.h5')
model.save(save_path)
metadata_path = '/valohai/outputs/model.h5.metadata.json'
with open(metadata_path, 'w') as outfile:
json.dump(metadata, outfile)
Use as an input
If you’re using the web app, you can select an alias as the input for your execution by searching for the alias name in the inputs data browser (number 1 in the picture below).
In valohai.yaml
You can set the default input of a step as a datum alias. Every time you run that step Valohai will fetch the data file that the alias is pointing to and use it to run the execution.
- step:
name: train-model
image: tensorflow/tensorflow:1.13.1
command: python myfile.py
inputs:
- name: mydata
default: datum://train-images
- name: mymodel
default: datum://model-prod
In this setup, the train-model
step is defined with a default input of a datum alias, train-images
, for the data and model-prod
for the model weights. When the step is executed, Valohai will automatically retrieve the specified data and model weights for seamless processing.