Build Custom Images

Build custom Docker images when you need speed, reproducibility, or system-level dependencies.

When to build custom images

Build a custom image when:

  • You use the same dependencies across many executions

  • You're building production pipelines

  • You need system libraries (CUDA, OpenCV, FFmpeg)

  • Startup time matters (avoid reinstalling packages)

  • You want reproducible environments with pinned versions

Don't build yet if:

  • You're still experimenting and dependencies change frequently

  • Installing packages in your script is fast enough

  • You only need pure Python packages

Start simple, optimize later.

Two ways to build

We recommend building Docker images with your existing workflow, locally or through CI/CD.

If you need to build images directly in Valohai, add the valohai-toolkit library to your organization. It provides a pre-configured step definition for building and pushing images to your registry.

Option 1: Build locally (traditional)

Build images on your machine and push to a registry.

Pros:

  • Full control over the build process

  • Fast iteration during development

  • Works with any Docker tooling

Cons:

  • Requires Docker installed locally

  • Need to manage registry authentication

  • Manual process (build, tag, push)

Option 2: Build on Valohai

Use the Docker Image Builder library step.

Pros:

  • No Docker installation needed

  • Handles authentication to private registry using organization credentials

  • Reproducible builds tracked in Valohai

  • Perfect for teams without Docker build experience

Cons:

  • Requires --privileged environment (contact Valohai support)

  • Slightly slower than local builds for rapid iteration


Building locally

1. Write your Dockerfile

Start with a base image and add your dependencies:

FROM python:3.11-slim

# Install system dependencies
RUN apt-get update && apt-get install -y \
    git \
    libgl1-mesa-glx \
    && rm -rf /var/lib/apt/lists/*

# Copy and install Python packages
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Set working directory
WORKDIR /workspace

Don't include:

  • Your code (comes from Git)

  • Your data (comes from data stores)

  • Secrets or tokens

2. Build the image

docker build -t myusername/ml-image:v1.0 .

Use a descriptive tag that includes a version number. Avoid latest for production.

3. Test locally

Run a container to verify everything works:

docker run --rm -it myusername/ml-image:v1.0 bash

Inside the container:

python --version
pip list
# Test imports
python -c "import tensorflow; print(tensorflow.__version__)"

4. Push to your registry

Docker Hub:

docker login
docker push myusername/ml-image:v1.0

AWS ECR:

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <account-id>.dkr.ecr.us-east-1.amazonaws.com
docker tag myusername/ml-image:v1.0 <account-id>.dkr.ecr.us-east-1.amazonaws.com/ml-image:v1.0
docker push <account-id>.dkr.ecr.us-east-1.amazonaws.com/ml-image:v1.0

GCP Artifact Registry:

gcloud auth configure-docker us-central1-docker.pkg.dev
docker tag myusername/ml-image:v1.0 us-central1-docker.pkg.dev/<project-id>/ml-images/ml-image:v1.0
docker push us-central1-docker.pkg.dev/<project-id>/ml-images/ml-image:v1.0

Azure Container Registry:

az acr login --name myregistry
docker tag myusername/ml-image:v1.0 myregistry.azurecr.io/ml-image:v1.0
docker push myregistry.azurecr.io/ml-image:v1.0

5. Use in Valohai

If using a private registry, configure authentication first.

Then reference your image in valohai.yaml:

- step:
    name: train
    image: myusername/ml-image:v1.0
    command:
      - python train.py

Building on Valohai

1. Write your Dockerfile

Same as local builds. Create a Dockerfile with your dependencies:

FROM python:3.11-slim
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
WORKDIR /workspace

2. Add the Docker Image Builder library

See Docker Image Builder for setup instructions.

3. Create a build execution

  1. Open your project

  2. Click Create Execution

  3. Select the appropriate builder step:

    • docker-image-aws for AWS ECR

    • docker-image-gcp for GCP Artifact Registry

    • docker-image-dockerhub for Docker Hub

  4. Provide your Dockerfile (as input or parameter)

  5. Set parameters:

    • repository: Your image name

    • docker-tag: Version tag

  6. Set registry environment variables

  7. Click Create Execution

4. Use the built image

After the build completes, reference it in your project:

- step:
    name: train
    image: myregistry.azurecr.io/ml-image:v1.0
    command:
      - python train.py

Common patterns

GPU workloads

Start with NVIDIA's official CUDA images:

FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04

RUN apt-get update && apt-get install -y python3 python3-pip
RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Enable GPU access in your images

Regardless of your base image, it's good practice to add these environment variables to your Dockerfile so Valohai executions can use GPUs:

ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility

Without these, your execution won't detect available GPUs even when running on GPU-enabled machines.

R workloads

FROM r-base:4.3.0

RUN R -e "install.packages(c('tidyverse', 'caret', 'randomForest'), repos='https://cran.rstudio.com/')"

WORKDIR /workspace

Julia workloads

FROM julia:1.9

RUN julia -e 'using Pkg; Pkg.add(["DataFrames", "CSV", "Plots"])'

WORKDIR /workspace

Troubleshooting

Build fails with "no space left on device"

Your Docker daemon is out of disk space. Clean up:

docker system prune -a

Image is too large

Check layer sizes:

docker history myimage:v1.0

Reduce size:

  • Use -slim or Alpine base images

  • Clean up in the same RUN command

  • Use multi-stage builds

  • Remove unnecessary files (docs, tests)

Push fails with authentication error

Docker Hub:

docker login

Other registries: Check your registry's authentication documentation. For AWS ECR, GCP, and Azure, see Private Docker Registries.


Next steps

Optimize your images: Follow Best Practices for faster builds and smaller images.

Use private registries: Set up authentication for your organization.

Last updated

Was this helpful?