Versioning and Reproducibility Note
Valohai simplifies reproducing past executions with the same code version by tracking the connected repository and commit used. However, commits from other repositories you access and clone in your code won’t be tracked.
Valohai projects can be connected to either public or private Git repositories, ensuring you can reproduce experiments precisely with the committed code and configuration files. Sometimes, you may need to fetch additional files from a different private repository during an execution run. Here’s how to do it:
Clone a repository
Generate a New SSH Key:
- Generate a new SSH key, for example:
ssh-keygen -t rsa -b 4096 -C "me@example.com"
. - Add the generated
.pub
file as a Deploy key to your code source control (GitHub, GitLab, BitBucket, etc.).
Add the Private Key to Valohai:
- You’ll need to add the private key to Valohai to access it during execution.
- Store this value as an environment variable under your project settings on Valohai.
- Important: Valohai’s environment secret doesn’t encode newline characters from your SSH key. Edit your key to include
\n
around the secret before pasting it into Valohai. The key should look like this:
-----BEGIN OPENSSH PRIVATE KEY-----\n\n-----END OPENSSH PRIVATE KEY-----
- Go to Valohai
- Open your project and navigate to
Settings
->Environment Variables
to add a new secret. - Name the variable
PRIVATE_KEY
and paste in the value of the private key you generated (with the\n
). - Check the “secret” box to hide the value from the UI and save it.
Clone the Repository
- Edit your
valohai.yaml
configuration file to download the Git repository. Use the following example:
---
- step:
name: Download repo
image: python:3.6
command:
- apt-get update
# Install Git
- apt-get install -y git
# Store the environment variable in a file
- echo -e $PRIVATE_KEY > ~/key_file
- chmod 600 ~/key_file
# Configure git to use the key
- export GIT_SSH_COMMAND="ssh -vvv -o StrictHostKeyChecking=no -i ~/key_file"
# Clone the repository
- git clone git@github.com:account/repository.git /downloaded_repo
- ls -la /downloaded_repo # List contents of the download folder (optional)
# Run your script
- python main.py
Now, you can access the files during runtime from your scripts. For example:
with open('/downloaded_repo/README.md', 'r') as f:
print(f.read())