Create and Manage Datasets
The Problem with Individual Files
# Managing individual files becomes tedious
inputs:
- name: train-images
default:
- datum://abc123...
- datum://def456...
- datum://ghi789...
# ... 47 more filesDatasets Solve This
Datasets vs Datums
Feature
Datum
Dataset
When to Use Datasets
Training/Validation/Test Splits
Image Classification
Multi-File Model Artifacts
Create a Dataset
Create via Code (Recommended)
Basic Dataset Creation
Create Training/Validation Split
Legacy Approach (Sidecar Files)
Create via Web UI
Step 1: Create the Dataset Container
Step 2: Create a Dataset Version
Use Datasets as Inputs
In valohai.yaml
URI Formats
In Code
Dataset Versioning
Version Naming
Version History
Update Existing Versions
Dataset Aliases
The latest Alias
latest AliasCustom Aliases
Create Alias via Web UI
Create Alias via Code
Use Aliases in Pipelines
Alias Best Practices
Directory Structure in Datasets
Flat Structure
Nested Structure
Performance: Package Files Together
The Problem
The Solution
Common Issues & Fixes
Dataset Version Not Created

Wrong Files in Dataset
Can't Use Dataset in Execution
Last updated
Was this helpful?
