Build Custom Images
Build custom Docker images when you need speed, reproducibility, or system-level dependencies.
When to build custom images
Build a custom image when:
You use the same dependencies across many executions
You're building production pipelines
You need system libraries (CUDA, OpenCV, FFmpeg)
Startup time matters (avoid reinstalling packages)
You want reproducible environments with pinned versions
Don't build yet if:
You're still experimenting and dependencies change frequently
Installing packages in your script is fast enough
You only need pure Python packages
Start simple, optimize later.
Two ways to build
We recommend building Docker images with your existing workflow, locally or through CI/CD.
If you need to build images directly in Valohai, add the valohai-toolkit library to your organization. It provides a pre-configured step definition for building and pushing images to your registry.
Option 1: Build locally (traditional)
Build images on your machine and push to a registry.
Pros:
Full control over the build process
Fast iteration during development
Works with any Docker tooling
Cons:
Requires Docker installed locally
Need to manage registry authentication
Manual process (build, tag, push)
Option 2: Build on Valohai
Use the Docker Image Builder library step.
Pros:
No Docker installation needed
Handles authentication to private registry using organization credentials
Reproducible builds tracked in Valohai
Perfect for teams without Docker build experience
Cons:
Requires
--privilegedenvironment (contact Valohai support)Slightly slower than local builds for rapid iteration
Building locally
1. Write your Dockerfile
Start with a base image and add your dependencies:
FROM python:3.11-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
git \
libgl1-mesa-glx \
&& rm -rf /var/lib/apt/lists/*
# Copy and install Python packages
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Set working directory
WORKDIR /workspaceDon't include:
Your code (comes from Git)
Your data (comes from data stores)
Secrets or tokens
2. Build the image
docker build -t myusername/ml-image:v1.0 .Use a descriptive tag that includes a version number. Avoid latest for production.
3. Test locally
Run a container to verify everything works:
docker run --rm -it myusername/ml-image:v1.0 bashInside the container:
python --version
pip list
# Test imports
python -c "import tensorflow; print(tensorflow.__version__)"4. Push to your registry
Docker Hub:
docker login
docker push myusername/ml-image:v1.0AWS ECR:
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <account-id>.dkr.ecr.us-east-1.amazonaws.com
docker tag myusername/ml-image:v1.0 <account-id>.dkr.ecr.us-east-1.amazonaws.com/ml-image:v1.0
docker push <account-id>.dkr.ecr.us-east-1.amazonaws.com/ml-image:v1.0GCP Artifact Registry:
gcloud auth configure-docker us-central1-docker.pkg.dev
docker tag myusername/ml-image:v1.0 us-central1-docker.pkg.dev/<project-id>/ml-images/ml-image:v1.0
docker push us-central1-docker.pkg.dev/<project-id>/ml-images/ml-image:v1.0Azure Container Registry:
az acr login --name myregistry
docker tag myusername/ml-image:v1.0 myregistry.azurecr.io/ml-image:v1.0
docker push myregistry.azurecr.io/ml-image:v1.05. Use in Valohai
If using a private registry, configure authentication first.
Then reference your image in valohai.yaml:
- step:
name: train
image: myusername/ml-image:v1.0
command:
- python train.pyBuilding on Valohai
1. Write your Dockerfile
Same as local builds. Create a Dockerfile with your dependencies:
FROM python:3.11-slim
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
WORKDIR /workspace2. Add the Docker Image Builder library
See Docker Image Builder for setup instructions.
3. Create a build execution
Open your project
Click Create Execution
Select the appropriate builder step:
docker-image-awsfor AWS ECRdocker-image-gcpfor GCP Artifact Registrydocker-image-dockerhubfor Docker Hub
Provide your Dockerfile (as input or parameter)
Set parameters:
repository: Your image name
docker-tag: Version tag
Set registry environment variables
Click Create Execution
4. Use the built image
After the build completes, reference it in your project:
- step:
name: train
image: myregistry.azurecr.io/ml-image:v1.0
command:
- python train.pyCommon patterns
GPU workloads
Start with NVIDIA's official CUDA images:
FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y python3 python3-pip
RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121Enable GPU access in your images
Regardless of your base image, it's good practice to add these environment variables to your Dockerfile so Valohai executions can use GPUs:
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utilityWithout these, your execution won't detect available GPUs even when running on GPU-enabled machines.
R workloads
FROM r-base:4.3.0
RUN R -e "install.packages(c('tidyverse', 'caret', 'randomForest'), repos='https://cran.rstudio.com/')"
WORKDIR /workspaceJulia workloads
FROM julia:1.9
RUN julia -e 'using Pkg; Pkg.add(["DataFrames", "CSV", "Plots"])'
WORKDIR /workspaceTroubleshooting
Build fails with "no space left on device"
Your Docker daemon is out of disk space. Clean up:
docker system prune -aImage is too large
Check layer sizes:
docker history myimage:v1.0Reduce size:
Use
-slimor Alpine base imagesClean up in the same
RUNcommandUse multi-stage builds
Remove unnecessary files (docs, tests)
Push fails with authentication error
Docker Hub:
docker loginOther registries: Check your registry's authentication documentation. For AWS ECR, GCP, and Azure, see Private Docker Registries.
Next steps
Optimize your images: Follow Best Practices for faster builds and smaller images.
Use private registries: Set up authentication for your organization.
Last updated
Was this helpful?
