Clone Repositories During Execution

Fetch additional private repositories at runtime

Sometimes you need to access code or files from another repository during execution—like shared utilities, model definitions, or configuration files. This guide shows how to securely clone private repositories at runtime.

💡 Reproducibility note: Valohai tracks the commit of your connected repository automatically. Commits from other repositories cloned during execution are not tracked. For full reproducibility, consider using Git submodules instead.

Use Cases

Clone additional repositories when:

  • You need utility functions from a shared library

  • Model definitions are in a separate repo

  • Configuration files are centrally managed

  • You're testing code before making it a submodule

Don't use this for:

  • Datasets (use Valohai data inputs instead)

  • Model checkpoints (use execution outputs)

  • Anything that changes frequently (consider merging repos)

Clone a Private Repository

Step 1: Generate SSH Key

Create an SSH key pair for accessing the external repository:

ssh-keygen -t rsa -b 4096 -f valohai-external-key

This creates:

  • valohai-external-key.pub – Public key (add to Git provider)

  • valohai-external-key – Private key (add to Valohai)

Step 2: Add Public Key to Git Provider

Add the public key as a deploy key on the repository you want to clone:

GitHub:

  1. Go to the repository → SettingsDeploy keysAdd deploy key

  2. Paste the contents of valohai-external-key.pub

  3. Leave "Allow write access" unchecked

GitLab:

  1. Go to the repository → SettingsRepositoryDeploy keys

  2. Paste the contents of valohai-external-key.pub

Bitbucket:

  1. Go to the repository → SettingsAccess keysAdd key

  2. Paste the contents of valohai-external-key.pub

Step 3: Add Private Key to Valohai

Store the private key as a secret environment variable:

  1. Open your Valohai project

  2. Go to SettingsEnvironment Variables

  3. Add a new variable:

    • Name: EXTERNAL_REPO_KEY

    • Value: The private key with \n replacing newlines

Important: Valohai doesn't encode newlines automatically. Format your key like this:

-----BEGIN OPENSSH PRIVATE KEY-----\n<key content>\n-----END OPENSSH PRIVATE KEY-----
  1. Check Secret to hide the value from logs

  2. Click Save

Step 4: Clone in Your Step

Update your valohai.yaml to clone the repository during execution:

- step:
    name: train-with-external-repo
    image: python:3.11
    command:
      # Install Git
      - apt-get update
      - apt-get install -y git
      
      # Write the SSH key to a file
      - echo -e $EXTERNAL_REPO_KEY > ~/external_key
      - chmod 600 ~/external_key
      
      # Configure Git to use the key
      - export GIT_SSH_COMMAND="ssh -o StrictHostKeyChecking=no -i ~/external_key"
      
      # Clone the external repository
      - git clone [email protected]:username/external-repo.git /external-repo
      
      # Verify the clone
      - ls -la /external-repo
      
      # Run your training script
      - python train.py

The cloned repository will be available at /external-repo during execution.

Access Files from the Cloned Repo

Use the cloned repository in your Python code:

import sys
sys.path.append('/external-repo')

# Import modules from the external repo
from external_repo.utils import preprocess_data

# Or read configuration files
with open('/external-repo/config.yaml', 'r') as f:
    config = yaml.safe_load(f)

Clone a Public Repository

Public repositories are simpler—no SSH key needed:

- step:
    name: train-with-public-repo
    image: python:3.11
    command:
      - apt-get update && apt-get install -y git
      - git clone https://github.com/username/public-repo.git /public-repo
      - python train.py

Clone Multiple Repositories

You can clone multiple repos in the same execution:

- step:
    name: train-with-multiple-repos
    image: python:3.11
    command:
      - apt-get update && apt-get install -y git
      
      # Set up SSH key
      - echo -e $EXTERNAL_REPO_KEY > ~/key
      - chmod 600 ~/key
      - export GIT_SSH_COMMAND="ssh -o StrictHostKeyChecking=no -i ~/key"
      
      # Clone first repo
      - git clone [email protected]:username/repo1.git /repo1
      
      # Clone second repo
      - git clone [email protected]:username/repo2.git /repo2
      
      # Use them in your script
      - python train.py

Clone Specific Branch or Commit

Clone a specific branch for reproducibility:

# Clone a specific branch
git clone -b feature-branch [email protected]:username/repo.git /repo

# Clone a specific commit
git clone [email protected]:username/repo.git /repo
cd /repo
git checkout abc123def

Or use shallow clones to save time:

# Clone only the latest commit (faster)
git clone --depth 1 [email protected]:username/repo.git /repo

Security Best Practices

Use read-only deploy keys

  • Don't grant write access

  • Create separate keys for each external repo

  • Rotate keys periodically

Mark environment variables as secrets

  • Always check the Secret box in Valohai

  • Secrets are hidden from logs and UI

  • They're still accessible in your code

Use SSH, not HTTPS with tokens

  • SSH keys are more secure than embedded tokens

  • They're easier to rotate

  • They don't expire

Don't log private keys

  • Never echo $EXTERNAL_REPO_KEY without redirecting to a file

  • Be careful with set -x or similar debugging flags

Troubleshooting

"Permission denied (publickey)"

  • The SSH key wasn't added correctly

  • Check that the public key is in the Git provider's deploy keys

  • Verify the private key format (must include \n for newlines)

"Host key verification failed"

  • Use StrictHostKeyChecking=no in GIT_SSH_COMMAND

  • This is safe because you're cloning from known Git providers

"Repository not found"

  • Check the repository URL (must be SSH format: git@...)

  • Ensure the deploy key has access to the repo

  • Verify the repository exists and is accessible

"Command not found: git"

  • Install Git in your Docker image or step command

  • Use apt-get install -y git for Debian/Ubuntu images

"Newline characters in key cause errors"

  • Valohai doesn't automatically encode newlines

  • Replace actual newlines with \n in the environment variable

  • The key should be one long line with \n markers

Alternative: Use Submodules

If you frequently clone the same repositories, consider Git submodules:

Submodules provide:

  • Automatic version tracking

  • Simpler setup (no environment variables)

  • Better reproducibility

Next Steps

Last updated

Was this helpful?