This is just a short recap
We strongly recommend completing the Mastering Valohai learning path on Valohai Academy.
This guide will provide you with a overview and a “cheatsheet” for migrating projects. It won’t explain the concepts and all the options in detail.
Valohai allows you to easily access files from your private object file storage, like AWS S3, Azure Blob Storage, or Google Cloud Bucket, without having to worry about the authentication, downloading, and caching in your own code.
You can configure access to a data store under the Project settings, or your organisation admin can give you access to various data stores through the organisation.
valohai.yaml
Once you’ve configured access to a data store you can add inputs in your valohai.yaml
.
Each input can point to one or multiple files, and you can use wildcards.
- step:
name: train-model
image: tensorflow/tensorflow:2.6.0
command:
- python train_model.py
inputs:
- name: images
default:
- s3://mybucket/factories/images/*.png
- azure://myblobstorage/factories/images/*.png
- gs://mybucket/factories/images/*.png
keep-directories: suffix
- name: model
default: datum://production-latest
filename: model.pkl
The inputs will be downloaded to /valohai/inputs/
.
In the above case, you’ll find your:
images
in the directory/valohai/inputs/images/
with the folder structure from your object data stores intact.model
will be downloaded to the directory/valohai/inputs/model/
and the file will always be renamed tomodel.pkl
Python
In Python you’ll access these files like any other file, as they’ll be available locally on the machine.
import os
from PIL import Image
# Path to downloaded model
model_path = '/valohai/inputs/model/model.pkl'
# Path to the directory where the images are downloaded
images_directory = '/valohai/inputs/images/'
# Loop through images in the directory
for root, dirs, files in os.walk(images_directory):
for filename in files:
image_path = os.path.join(root, filename)
image_name = os.path.basename(image_path)
image = Image.open(image_path)
image.load()
valohai-utils
You can also get the paths to the inputs with valohai-utils
:
import valohai
import os
from PIL import Image
# Path to downloaded model
model_path = valohai.inputs("model").path()
# Path to the directory where the images are downloaded
images_directory = valohai.inputs("images").dir_path()
# Loop through images in the directory
for root, dirs, files in os.walk(images_directory):
for filename in files:
image_path = os.path.join(root, filename)
image_name = os.path.basename(image_path)
image = Image.open(image_path)
image.load()