Update Dataset Versions
Create new dataset versions by building on existing ones—add new files, exclude specific files, or start fresh while maintaining version lineage.
When to Update Dataset Versions
Use incremental versioning when you need to:
Add new data — Append newly processed files to existing dataset
Remove bad data — Exclude specific files while keeping the rest
Swap files — Replace specific files without recreating entire dataset
A/B test data — Create variant datasets by excluding different subsets
Basic Update Pattern
The standard approach creates a new version based on an existing one:
import json
metadata = {
"new_file.csv": {
"valohai.dataset-versions": [
{
"uri": "dataset://my-dataset/v3", # New version to create
"from": "dataset://my-dataset/v2", # Base version
"start_fresh": False, # Include files from v2
"exclude": ["bad_file.csv", "old_file.csv"], # Remove these specific files
},
],
},
}
# Save your new output file
new_data.to_csv("/valohai/outputs/new_file.csv")
# Save metadata
metadata_path = "/valohai/outputs/valohai.metadata.jsonl"
with open(metadata_path, "w") as outfile:
for file_name, file_metadata in metadata.items():
json.dump({"file": file_name, "metadata": file_metadata}, outfile)
outfile.write("\n")What happens:
Creates
v3based onv2Includes all files from
v2exceptbad_file.csvandold_file.csvAdds
new_file.csvto the new version
Update Parameters
uri (required)
uri (required)The new dataset version to create.
from (optional)
from (optional)Base the new version on an existing version.
If omitted: New version starts empty (only includes files in current metadata).
start_fresh (optional, default: False)
start_fresh (optional, default: False)Controls whether to include files from the base version.
exclude (optional)
exclude (optional)List of filenames to exclude from the base version.
Important: Filenames are exact matches (case-sensitive).
targeting_aliases (optional)
targeting_aliases (optional)Update dataset aliases to point to the new version.
See Dataset Aliases for details.
Common Update Patterns
Add New Files to Existing Dataset
Keep all existing files, add new ones:
Result:
v2= All files fromv1+data_010.csvthroughdata_014.csv
Remove Specific Files
Exclude files that failed validation:
Result:
v2= All files fromv1except the three failed files + validation report
⚠️ validation_report.txt file can't be excluded from the dataset version - even if added to the list of
excludedfiles!
Replace Specific Files
Exclude old files and add updated versions:
Result:
v3= All files fromv2except olddata_001.csvanddata_005.csv+ new versions
Start Fresh (Link Versions Without Files)
Create a new version lineage without inheriting files:
Result:
v2= Onlynew_approach.csv(no files fromv1)Version history shows
v2was based onv1(for tracking)
Use case: You want to track version lineage (this experiment followed that one) but don't want to inherit data.
Create A/B Test Variants
Split dataset into two variants for comparison:
Run a second execution for Variant B that excludes group A files instead.
Update via Web UI
The UI provides a simpler way to create new versions based on existing ones.
Open your dataset in Data → Datasets
Find the version you want to base on
Click the
...menu at the end of the version rowSelect "Create new version from this version"
The new version starts with all files from the base version
Add new files or remove unwanted files
Name the new version
Save
Limitations: The UI doesn't support the exclude parameter—you must manually remove files one by one.
Combining Multiple Updates
You can reference multiple base versions or create complex update logic:
Result: combined/v1 includes files from both source-a/latest and source-b/latest.
Best Practices
Use Descriptive Version Names
Document Changes
Add a marker file explaining what changed:
Validate Before Creating New Version
Use Aliases for Promotion Workflow
Common Issues & Fixes
Files Not Excluded
Symptom: Specified files in exclude list still appear in new version
Causes & Fixes:
Typo in filename → Filenames must match exactly (case-sensitive)
File path included → Use just filename, not full path
start_fresh: True→excludedoesn't apply withstart_fresh: True
Debug:
Base Version Files Not Included
Symptom: New version is empty or missing files from base version
Causes & Fixes:
start_fresh: True→ Set toFalseto include base filesMissing
fromparameter → Specify which version to base onWrong base version URI → Verify version exists and spelling is correct
New Version Not Created
Symptom: Execution completes but dataset version doesn't appear
How to diagnose:
Open execution in Valohai UI
Click Alerts tab
Look for dataset creation errors
Common causes:
Invalid JSON in metadata → Validate JSON syntax
Missing output files → Ensure all files in metadata were actually saved
Base version doesn't exist → Verify
fromversion exists
Related Pages
Create and Manage Datasets — Core dataset concepts and creation
Add Context to Your Data Files — Metadata system overview
Load Data in Jobs — Use dataset versions as inputs
Next Steps
Practice updating a dataset by excluding specific files
Set up a validation → production promotion workflow using aliases
Create A/B test variants from a baseline dataset
Learn about dataset packaging for large collections
Last updated
Was this helpful?
