Time Limits
Time limits help you control costs and prevent runaway executions. Valohai provides two timeout mechanisms:
Time Limit — Maximum total execution duration
No Output Timeout — Stop if execution produces no logs for a period
Both can be configured in the web UI or in your valohai.yaml.
Time Limit
Set a maximum duration for your execution. When the time limit is reached, Valohai terminates the execution.
Use cases:
Prevent forgotten executions from running indefinitely
Enforce budget constraints on expensive GPU instances
Ensure batch jobs complete within a scheduling window
Set in Web UI
Create a new execution
Scroll to the Runtime section
Check Set a Time Limit
Enter the maximum duration in hours and minutes
Not setting a time limit allows the execution to run indefinitely. This is the default behavior.
Set in valohai.yaml
Define a default time limit for a step:
The time-limit value supports human-readable formats like 1h 30m 5s, or you can specify seconds as an integer (e.g., 3600).
No Output Timeout
Stop executions that become unresponsive. If your execution produces no logs or output for the specified duration, Valohai terminates it.
Use cases:
Detect and stop hung processes
Catch infinite loops that produce no output
Identify network or I/O blocking issues
Set in Web UI
Create a new execution
Scroll to the Runtime section
Check Set a No Output Timeout
Enter the timeout duration in hours and minutes
Not setting this will default to about 8 hours.
Set in valohai.yaml
Define a default no-output timeout for a step:
The no-output-timeout value supports human-readable formats like 1h 30m 5s, or you can specify seconds as an integer (e.g., 1800).
Example: Complete Step Configuration
Combine time limits with other step settings:
Best Practices
Set reasonable defaults in YAML. Define time limits in your valohai.yaml so all team members use consistent settings. Override in the UI when needed.
Use no-output timeout to catch hangs. Long-running jobs should periodically log progress. If your training loop runs for hours without output, it may be stuck.
Account for setup time. Time limits include dependency installation, data download, and model initialization. Give enough buffer for these steps.
Combine with early stopping. For training jobs, consider using early stopping based on metrics in addition to time limits.
Troubleshooting
Execution stopped unexpectedly
Check the execution logs for timeout messages. Common causes:
Time limit reached — Increase the limit or optimize your code
No output timeout — Add periodic logging to your training loop
Related
Early Stopping — Stop based on metadata conditions
Spot Instances — Handle interruptions for cost savings
Run Basic Execution — Creating and running executions
Last updated
Was this helpful?
