# Build Custom Images

Build custom Docker images when you need speed, reproducibility, or system-level dependencies.

## When to build custom images

**Build a custom image when:**

* You use the same dependencies across many executions
* You're building production pipelines
* You need system libraries (CUDA, OpenCV, FFmpeg)
* Startup time matters (avoid reinstalling packages)
* You want reproducible environments with pinned versions

**Don't build yet if:**

* You're still experimenting and dependencies change frequently
* Installing packages in your script is fast enough
* You only need pure Python packages

Start simple, optimize later.

## Two ways to build

We recommend building Docker images with your existing workflow, locally or through CI/CD.

If you need to build images directly in Valohai, add the [**valohai-toolkit**](/reusable-step-libraries/build-your-own-library/docker-image-builder.md) library to your organization. It provides a pre-configured step definition for building and pushing images to your registry.

### Option 1: Build locally (traditional)

Build images on your machine and push to a registry.

**Pros:**

* Full control over the build process
* Fast iteration during development
* Works with any Docker tooling

**Cons:**

* Requires Docker installed locally
* Need to manage registry authentication
* Manual process (build, tag, push)

### Option 2: Build on Valohai

Use the [Docker Image Builder](/reusable-step-libraries/build-your-own-library/docker-image-builder.md) library step.

**Pros:**

* No Docker installation needed
* Handles authentication to private registry using organization credentials
* Reproducible builds tracked in Valohai
* Perfect for teams without Docker build experience

**Cons:**

* Requires `--privileged` environment (contact Valohai support)
* Slightly slower than local builds for rapid iteration

***

## Building locally

### 1. Write your Dockerfile

Start with a base image and add your dependencies:

```dockerfile
FROM python:3.11-slim

# Install system dependencies
RUN apt-get update && apt-get install -y \
    git \
    libgl1-mesa-glx \
    && rm -rf /var/lib/apt/lists/*

# Copy and install Python packages
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Set working directory
WORKDIR /workspace
```

**Don't include:**

* Your code (comes from Git)
* Your data (comes from data stores)
* Secrets or tokens

### 2. Build the image

```shell
docker build -t myusername/ml-image:v1.0 .
```

Use a descriptive tag that includes a version number. Avoid `latest` for production.

### 3. Test locally

Run a container to verify everything works:

```shell
docker run --rm -it myusername/ml-image:v1.0 bash
```

Inside the container:

```shell
python --version
pip list
# Test imports
python -c "import tensorflow; print(tensorflow.__version__)"
```

### 4. Push to your registry

**Docker Hub:**

```shell
docker login
docker push myusername/ml-image:v1.0
```

**AWS ECR:**

```shell
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <account-id>.dkr.ecr.us-east-1.amazonaws.com
docker tag myusername/ml-image:v1.0 <account-id>.dkr.ecr.us-east-1.amazonaws.com/ml-image:v1.0
docker push <account-id>.dkr.ecr.us-east-1.amazonaws.com/ml-image:v1.0
```

**GCP Artifact Registry:**

```shell
gcloud auth configure-docker us-central1-docker.pkg.dev
docker tag myusername/ml-image:v1.0 us-central1-docker.pkg.dev/<project-id>/ml-images/ml-image:v1.0
docker push us-central1-docker.pkg.dev/<project-id>/ml-images/ml-image:v1.0
```

**Azure Container Registry:**

```shell
az acr login --name myregistry
docker tag myusername/ml-image:v1.0 myregistry.azurecr.io/ml-image:v1.0
docker push myregistry.azurecr.io/ml-image:v1.0
```

### 5. Use in Valohai

If using a private registry, configure [authentication](/docker-in-valohai/private-docker-registries.md) first.

Then reference your image in `valohai.yaml`:

```yaml
- step:
    name: train
    image: myusername/ml-image:v1.0
    command:
      - python train.py
```

***

## Building on Valohai

### 1. Write your Dockerfile

Same as local builds. Create a Dockerfile with your dependencies:

```dockerfile
FROM python:3.11-slim
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
WORKDIR /workspace
```

### 2. Add the Docker Image Builder library

See [Docker Image Builder](/reusable-step-libraries/build-your-own-library/docker-image-builder.md) for setup instructions.

### 3. Create a build execution

1. Open your project
2. Click **Create Execution**
3. Select the appropriate builder step:
   * `docker-image-aws` for AWS ECR
   * `docker-image-gcp` for GCP Artifact Registry
   * `docker-image-dockerhub` for Docker Hub
4. Provide your Dockerfile (as input or parameter)
5. Set parameters:
   * **repository**: Your image name
   * **docker-tag**: Version tag
6. Set registry environment variables
7. Click **Create Execution**

### 4. Use the built image

After the build completes, reference it in your project:

```yaml
- step:
    name: train
    image: myregistry.azurecr.io/ml-image:v1.0
    command:
      - python train.py
```

***

## Common patterns

### GPU workloads

Start with NVIDIA's official CUDA images:

```dockerfile
FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04

RUN apt-get update && apt-get install -y python3 python3-pip
RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
```

### Enable GPU access in your images <a href="#enable-gpu-access-in-your-images" id="enable-gpu-access-in-your-images"></a>

Regardless of your base image, it's good practice to add these environment variables to your Dockerfile so Valohai executions can use GPUs:

```dockerfile
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
```

Without these, your execution won't detect available GPUs even when running on GPU-enabled machines.

### R workloads

```dockerfile
FROM r-base:4.3.0

RUN R -e "install.packages(c('tidyverse', 'caret', 'randomForest'), repos='https://cran.rstudio.com/')"

WORKDIR /workspace
```

### Julia workloads

```dockerfile
FROM julia:1.9

RUN julia -e 'using Pkg; Pkg.add(["DataFrames", "CSV", "Plots"])'

WORKDIR /workspace
```

***

## Troubleshooting

### Build fails with "no space left on device"

Your Docker daemon is out of disk space. Clean up:

```shell
docker system prune -a
```

### Image is too large

**Check layer sizes:**

```shell
docker history myimage:v1.0
```

**Reduce size:**

* Use `-slim` or Alpine base images
* Clean up in the same `RUN` command
* Use multi-stage builds
* Remove unnecessary files (docs, tests)

### Push fails with authentication error

**Docker Hub:**

```shell
docker login
```

**Other registries:** Check your registry's authentication documentation. For AWS ECR, GCP, and Azure, see [Private Docker Registries](/docker-in-valohai/private-docker-registries.md).

***

## Next steps

**Optimize your images:** Follow [Best Practices](/docker-in-valohai/image-best-practices.md) for faster builds and smaller images.

**Use private registries:** Set up [authentication](/docker-in-valohai/private-docker-registries.md) for your organization.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/docker-in-valohai/building-images.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
