Build Custom Images

Build custom Docker images when you need speed, reproducibility, or system-level dependencies.

When to build custom images

Build a custom image when:

You use the same dependencies across many executions
You're building production pipelines
You need system libraries (CUDA, OpenCV, FFmpeg)
Startup time matters (avoid reinstalling packages)
You want reproducible environments with pinned versions

Don't build yet if:

You're still experimenting and dependencies change frequently
Installing packages in your script is fast enough
You only need pure Python packages

Start simple, optimize later.

Two ways to build

We recommend building Docker images with your existing workflow, locally or through CI/CD.

If you need to build images directly in Valohai, add the valohai-toolkit library to your organization. It provides a pre-configured step definition for building and pushing images to your registry.

Option 1: Build locally (traditional)

Build images on your machine and push to a registry.

Pros:

Full control over the build process
Fast iteration during development
Works with any Docker tooling

Cons:

Requires Docker installed locally
Need to manage registry authentication
Manual process (build, tag, push)

Option 2: Build on Valohai

Use the Docker Image Builder library step.

Pros:

No Docker installation needed
Handles authentication to private registry using organization credentials
Reproducible builds tracked in Valohai
Perfect for teams without Docker build experience

Cons:

Requires --privileged environment (contact Valohai support)
Slightly slower than local builds for rapid iteration

Building locally

1. Write your Dockerfile

Start with a base image and add your dependencies:

FROM python:3.11-slim

# Install system dependencies
RUN apt-get update && apt-get install -y \
    git \
    libgl1-mesa-glx \
    && rm -rf /var/lib/apt/lists/*

# Copy and install Python packages
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Set working directory
WORKDIR /workspace

Don't include:

Your code (comes from Git)
Your data (comes from data stores)
Secrets or tokens

2. Build the image

docker build -t myusername/ml-image:v1.0 .

Use a descriptive tag that includes a version number. Avoid latest for production.

3. Test locally

Run a container to verify everything works:

docker run --rm -it myusername/ml-image:v1.0 bash

Inside the container:

python --version
pip list
# Test imports
python -c "import tensorflow; print(tensorflow.__version__)"

4. Push to your registry

Docker Hub:

docker login
docker push myusername/ml-image:v1.0

AWS ECR:

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <account-id>.dkr.ecr.us-east-1.amazonaws.com
docker tag myusername/ml-image:v1.0 <account-id>.dkr.ecr.us-east-1.amazonaws.com/ml-image:v1.0
docker push <account-id>.dkr.ecr.us-east-1.amazonaws.com/ml-image:v1.0

GCP Artifact Registry:

gcloud auth configure-docker us-central1-docker.pkg.dev
docker tag myusername/ml-image:v1.0 us-central1-docker.pkg.dev/<project-id>/ml-images/ml-image:v1.0
docker push us-central1-docker.pkg.dev/<project-id>/ml-images/ml-image:v1.0

Azure Container Registry:

az acr login --name myregistry
docker tag myusername/ml-image:v1.0 myregistry.azurecr.io/ml-image:v1.0
docker push myregistry.azurecr.io/ml-image:v1.0

5. Use in Valohai

If using a private registry, configure authentication first.

Then reference your image in valohai.yaml:

- step:
    name: train
    image: myusername/ml-image:v1.0
    command:
      - python train.py

Building on Valohai

1. Write your Dockerfile

Same as local builds. Create a Dockerfile with your dependencies:

FROM python:3.11-slim
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
WORKDIR /workspace

2. Add the Docker Image Builder library

See Docker Image Builder for setup instructions.

3. Create a build execution

Open your project
Click Create Execution
Select the appropriate builder step:
- docker-image-aws for AWS ECR
- docker-image-gcp for GCP Artifact Registry
- docker-image-dockerhub for Docker Hub
Provide your Dockerfile (as input or parameter)
Set parameters:
- repository: Your image name
- docker-tag: Version tag
Set registry environment variables
Click Create Execution

4. Use the built image

After the build completes, reference it in your project:

- step:
    name: train
    image: myregistry.azurecr.io/ml-image:v1.0
    command:
      - python train.py

Common patterns

GPU workloads

Start with NVIDIA's official CUDA images:

FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04

RUN apt-get update && apt-get install -y python3 python3-pip
RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Enable GPU access in your images

Regardless of your base image, it's good practice to add these environment variables to your Dockerfile so Valohai executions can use GPUs:

ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility

Without these, your execution won't detect available GPUs even when running on GPU-enabled machines.

R workloads

FROM r-base:4.3.0

RUN R -e "install.packages(c('tidyverse', 'caret', 'randomForest'), repos='https://cran.rstudio.com/')"

WORKDIR /workspace

Julia workloads

FROM julia:1.9

RUN julia -e 'using Pkg; Pkg.add(["DataFrames", "CSV", "Plots"])'

WORKDIR /workspace

Troubleshooting

Build fails with "no space left on device"

Your Docker daemon is out of disk space. Clean up:

docker system prune -a

Image is too large

Check layer sizes:

docker history myimage:v1.0

Reduce size:

Use -slim or Alpine base images
Clean up in the same RUN command
Use multi-stage builds
Remove unnecessary files (docs, tests)

Push fails with authentication error

Docker Hub:

docker login

Other registries: Check your registry's authentication documentation. For AWS ECR, GCP, and Azure, see Private Docker Registries.

Next steps

Optimize your images: Follow Best Practices for faster builds and smaller images.

Use private registries: Set up authentication for your organization.

PreviousDocker in Valohai NextImage Best Practices

Last updated 28 days ago

Was this helpful?