Image Best Practices
Follow these guidelines to build efficient, reproducible Docker images for Valohai.
Use specific version tags
Always pin versions for reproducibility.
Good:
FROM python:3.11.4
RUN pip install tensorflow==2.13.0Avoid:
FROM python:latest
RUN pip install tensorflowWhy? latest tags change over time. Six months from now, latest might be Python 3.13 with breaking changes. Pinned versions ensure your executions stay reproducible.
Start with minimal base images
Smaller images download faster and use less disk space.
Good choices:
python:3.11-slim(smaller thanpython:3.11)nvidia/cuda:12.1.0-base-ubuntu22.04(only CUDA runtime, not full SDK)alpinevariants when compatible
Compare sizes:
python:3.11→ 1.0 GBpython:3.11-slim→ 130 MB
For GPU workloads, use NVIDIA's official base images to ensure CUDA compatibility.
Leverage Docker layer caching
Docker builds images in layers. Each instruction in your Dockerfile creates a layer that can be cached.
Order matters:
FROM python:3.11-slim
# 1. Install system dependencies (changes rarely)
RUN apt-get update && apt-get install -y \
git \
libgl1-mesa-glx \
&& rm -rf /var/lib/apt/lists/*
# 2. Copy requirements first (changes occasionally)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# 3. Copy code last (changes frequently)
COPY . /workspace
WORKDIR /workspaceWhen you change your code, only the last layer rebuilds. Requirements and system packages stay cached.
Don't include code or data in the image
Your Docker image should only contain the environment, not your code or data.
Your image:
Python runtime
System libraries
Python packages
Not your image:
Training scripts (comes from Git)
Datasets (comes from data stores)
Model files (generated during execution)
Why? Separating code from environment makes images reusable and keeps them small.
Pin Python package versions
Use a requirements.txt with exact versions:
tensorflow==2.13.0
transformers==4.30.2
numpy==1.24.3Avoid version ranges like tensorflow>=2.0 in production images. Ranges are fine for experimentation, but pinned versions ensure reproducibility.
Cache control in Valohai
Valohai caches Docker images on worker machines by default. This means the first execution downloads the image, and subsequent executions reuse the cached version.
Force a fresh image pull
If you've updated an image in your registry (using the same tag), force Valohai to pull the latest version:
Set the environment variable VH_NO_IMAGE_CACHE=1 on your execution.
This ignores the cached image and pulls fresh from the registry.
Clear all caches
To clear both image and data caches from a worker machine:
Set VH_CLEAN=1 on your execution.
This forcibly removes all Docker images and cached data before and after execution. Use sparingly—it adds significant time.
When to use cache controls
VH_NO_IMAGE_CACHE=1→ You pushed a new version with the same tag (not recommended, but sometimes necessary)VH_CLEAN=1→ Debugging disk space issues or testing fresh environments
For normal workflows, let Valohai's default caching work. It's fast and efficient.
Speed up image downloads with a pull-through cache
If you frequently build or pull large Docker images, a pull-through cache can significantly reduce download times.
When to use this
Consider a pull-through cache if:
You build or update Docker images frequently
Download speeds are slow or you hit timeouts
You want to reduce bandwidth costs
How it works
Valohai sets up a caching server in your VPC. When workers pull images, they first check the cache. If the image exists, it's served locally (fast). If not, it's fetched once and cached for future use.
Setup
This requires a dedicated machine in your VPC and network configuration to route traffic through the cache.
Contact [email protected] to set up a pull-through cache for your organization.
Custom container runtime options
Valohai controls the docker run command and its arguments. This ensures executions work consistently across environments.
You cannot:
Pass custom
docker runflagsOverride the entrypoint Valohai sets
Modify container networking or volume mounts
You can:
Use any Docker image
Pass parameters to your code
Set environment variables
Mount data from your data stores
This design keeps infrastructure management outside your containers, so you focus on code, not configuration.
If you have a use case requiring custom docker run arguments, contact our support team to discuss alternatives.
Building images without Docker installed
You don't need Docker installed locally to build images. Use the Docker Image Builder from Valohai's Reusable Step Libraries.
This library step:
Takes your Dockerfile as input
Builds the image on Valohai infrastructure
Pushes to your registry
Perfect for teams without Docker experience or for CI/CD pipelines.
Summary
Fastest path:
Start with
python:3.11-slimor similarPin all versions
Install packages in your code while iterating
Build a custom image once dependencies stabilize
Production-ready:
Use multi-stage builds
Clean up in the same layer
Leverage layer caching
Never include code or data in the image
Troubleshooting:
When launching an execution
Use
VH_NO_IMAGE_CACHE=1to pull fresh imagesUse
VH_CLEAN=1to clear all caches (rarely needed)
Contact support for pull-through cache setup
Last updated
Was this helpful?
