Update Dataset Versions
When to Update Dataset Versions
Basic Update Pattern
import json
metadata = {
"new_file.csv": {
"valohai.dataset-versions": [
{
"uri": "dataset://my-dataset/v3", # New version to create
"from": "dataset://my-dataset/v2", # Base version
"start_fresh": False, # Include files from v2
"exclude": ["bad_file.csv", "old_file.csv"], # Remove these specific files
},
],
},
}
# Save your new output file
new_data.to_csv("/valohai/outputs/new_file.csv")
# Save metadata
metadata_path = "/valohai/outputs/valohai.metadata.jsonl"
with open(metadata_path, "w") as outfile:
for file_name, file_metadata in metadata.items():
json.dump({"file": file_name, "metadata": file_metadata}, outfile)
outfile.write("\n")Update Parameters
uri (required)
uri (required)from (optional)
from (optional)start_fresh (optional, default: False)
start_fresh (optional, default: False)exclude (optional)
exclude (optional)targeting_aliases (optional)
targeting_aliases (optional)Common Update Patterns
Add New Files to Existing Dataset
Remove Specific Files
Replace Specific Files
Start Fresh (Link Versions Without Files)
Create A/B Test Variants
Update via Web UI
Combining Multiple Updates
Best Practices
Use Descriptive Version Names
Document Changes
Validate Before Creating New Version
Use Aliases for Promotion Workflow
Common Issues & Fixes
Files Not Excluded
Base Version Files Not Included
New Version Not Created
Related Pages
Next Steps
Last updated
Was this helpful?
