Dataset Version Trigger
Launch a pipeline automatically when a new dataset version is created. This guide shows you how to parse notification payloads, handle dataset inputs, and configure pipeline and trigger settings for event-driven data processing.
Use Case
Process newly uploaded datasets automatically without manual intervention. When a user uploads data to Valohai and creates a new dataset version out of it, your preprocessing pipeline launches immediately and processes all files in the dataset version.
Perfect for:
Auto-validating uploaded data
Preprocessing raw data for training
Generating data quality reports
Extracting features from new samples
How It Works
Upload dataset: User creates a new dataset version
Notification fires: Valohai generates "dataset version created" event
Trigger launches: Pipeline starts automatically
Parse payload: First node extracts dataset URI from notification
Process data: Second node receives URI and processes files
Step 1: Create the Scripts
Parse Notification Payload
This script extracts the dataset version URI from the notification and outputs it as metadata for the next pipeline node.
Create parse-notification.py:
Why metadata? Outputting as JSON metadata (not a file) lets you connect it directly to parameters in the next pipeline node using edges.
Data Processing Script
This example lists all files in the dataset. Replace this with your actual processing logic.
Create list-inputs.py:
Replace this script with your actual data processing code. The key is that it receives a dataset URI via the dataset input.
Dependencies
Add to requirements.txt:
Step 2: Configure Pipeline in YAML
Create your valohai.yaml with two steps and a pipeline that connects them:
How the Pipeline Works
Node 1: parse-notification
Receives webhook payload as input
Extracts dataset version URI
Outputs URI as metadata (
dataset)
Edge: Connect metadata to parameter
Takes
parse-notification.metadata.datasetPasses it to
list-inputs.parameter.dataset_url
Node 2: list-inputs
Receives dataset URI via parameter
Parameter becomes input via
{parameter:dataset_url}Processes all files in the dataset
Naming matters: Step names, parameter names, and input names must match exactly between your YAML and trigger configuration.
Step 3: Create the Trigger
Go to Project Settings → Triggers
Click Create Trigger
Configure the trigger:
Basic Settings:
Title:
Dataset version -> new data handler pipelineTrigger Type: Notification
Conditions (optional):
Payload Filter if you want to filter by specific dataset:
Lookup Path:
data.dataset.nameOperation: Equals
Invariant:
your-dataset-name
Actions:
Action Type: Run Pipeline
Source Commit Reference:
main(or your branch)Pipeline Name:
Dataset handling automationPipeline Title: (optional title for pipeline runs)
Payload Input Name:
parse-notification.payload
Payload Input Name format: <node-name>.<input-name> specifies which pipeline node receives the notification payload.
Click Save Trigger
A Managed Trigger Channel is automatically created for this trigger.
Step 4: Set Up Notification Routing
Connect the "dataset version created" event to your trigger:
Go to Project Settings → Notifications → Project Notifications
Click Create new notification routing
Configure routing:
Event:
dataset version is createdFilter events by users: All users (or select specific users)
Channel: Select
Launches trigger: Dataset version -> new data handler pipeline
Click Save
Now your automation is live!
Step 5: Test the Workflow
Test Parsing Node Manually
Before testing the full trigger, verify the parsing step works:
Create a test payload file
test-payload.json:
Create a new execution:
Select step:
parse-notificationAdd input: Upload your test payload file to the
payloadinput
Run execution
Check logs: Dataset version URL should be printed
Check metadata: Should show extracted URI
Test Processing Node Manually
Verify the data processing step works:
Create a dataset and add files to it
Copy the dataset version datum URL (from dataset page)
Create execution:
Select step:
list-inputsPaste dataset URI into Data → Inputs → dataset → URL field
Set
dataset_urlparameter to empty string ("")
Run execution
Check logs: Should list all files in dataset
Test Full Trigger
Once manual tests pass, trigger the real automation:
Go to your dataset
Create a new version (upload files or create empty version)
Check Executions page: Pipeline should launch automatically
Verify both nodes execute correctly
Check trigger logs if anything fails: Project Settings → Triggers → ... menu → View Logs
Troubleshooting
Trigger Not Launching
Check notification routing:
Verify event type is "dataset version is created"
Confirm channel is selected correctly
Check if user filter is too restrictive
Check trigger status:
Ensure trigger is enabled
Look at trigger logs for errors
Verify payload filters aren't blocking events
Pipeline Fails at Parsing Node
Common issues:
Payload input not configured correctly
Input name mismatch between trigger config and YAML
JSON parsing error in script
Debug steps:
Check execution logs for Python errors
Verify payload file exists and is valid JSON
Print payload content to see structure
Pipeline Fails at Processing Node
Common issues:
Dataset URI not passed correctly through edge
Parameter name mismatch
Input configuration error
Debug steps:
Check if parameter received correct value
Verify edge configuration in pipeline YAML
Test processing node manually with known-good dataset URI
Trigger Fires Too Often
Solutions:
Add payload filter to only trigger for specific datasets
Add user filter to only trigger for certain team members
Check if multiple notification routings are configured
Next Steps
Auto-deploy models: Model Version Trigger Guide
Schedule training: Scheduled Triggers
Notify team: Outgoing Webhooks
Last updated
Was this helpful?
