You can easily download data from cloud storage to your notebook execution by utilizing Valohai inputs as you do in the Executions, Tasks and Pipelines. You can define inputs when creating a new notebook execution, an input can point to a single file from a cloud data store, or to a collection of files.
Warning
We recommend using os.getenv('VH_INPUTS_DIR')
and os.getenv('VH_OUTPUTS_DIR')
for loading and saving data in your Notebook. This helps prevent errors due to incorrect file locations.
Examples of Possible Input Formats
Public Cloud Storage URL
https://valohaidemo.blob.core.windows.net/mnist/mnist.npz
: Downloads a single file from a public location.
Private S3 Data Store
-
s3://mybucket/orders/may_2022.csv
: Downloads a single file that your Valohai project has access to from a private S3 Data Store. -
s3://mybucket/pytorch-sample/*
: Downloads all files from thepytorch-sample
folder in your private S3 Data Store.
Accessing Inputs in Your Notebook
You can access these inputs in your notebook via the Valohai inputs.
Using Python and the valohai-utils Toolkit
import valohai
import csv
# Open a single CSV file from Valohai inputs
with open(valohai.inputs("input").path()) as csv_file:
reader = csv.reader(csv_file, delimiter=',')
Tip
See the Valohai documentation for details on accessing multiple files or an archive of files using valohai-utils.
Using Python to Configure Inputs
First, configure the inputs to your step in valohai.yaml
and update your code to read data from the Valohai inputs instead of directly from your cloud storage.
import os
import pandas as pd
# Get the location of the Valohai inputs directory
VH_INPUTS_DIR = os.getenv('VH_INPUTS_DIR', '.inputs')
# Get the path to your individual inputs file
# e.g., /valohai/inputs/input/may_2022.csv
path_to_file = os.path.join(VH_INPUTS_DIR, 'input/may_2022.csv')
# Read CSV file using Pandas
pd.read_csv(path_to_file)
By properly configuring your inputs in valohai.yaml
and using these methods, you can efficiently manage and access your data within your notebook executions.
For further details, refer to the Valohai documentation on ingesting files.