Track Underutilization

Identify and address underutilized resources to reduce costs and improve training efficiency.

Why this matters

If your executions consistently have low peak utilization, you're paying for hardware you're not using.

A model training on an 8-GPU instance but only using 40% of GPU capacity could run on a 4-GPU instance at half the cost. Underutilization tracking helps you spot these opportunities automatically.

How it works

Valohai automatically highlights executions with less than 50% peak utilization of any resource.

This threshold is currently fixed. You can specify which resources you want to track under Project > Notifications settings.

Valohai only reports on executions that took more than one minute to complete, filtering out short setup tasks where utilization isn't meaningful.

💡 If you need a different threshold, contact your Valohai representative to help us gather requirements for upcoming features.

Underutilization indicators

Alerts

Lightweight messages displayed on the execution page.

Alerts can't be turned off but can be hidden on execution lists.

Runtime alerts on the execution view

Alerts as shown on the execution details page.

Alerts on the execution list

Turn on the Alerts column in execution listings to see warnings at a glance.

  1. Column settings: Change Alerts column visibility here.

  2. Alerts column: Click to navigate to the full list of alerts for an execution.

Notifications

Receive alerts via in-app messages, email, or Slack.

  • Personal notifications: Sent to the user who configured the notification.

  • Project notifications: Sent to the target channel (e.g., Slack or webhook).

Configure both types in the project settings under the Notifications tab.

Personal notifications in the project settings
  1. Project settings page: Navigate here from your project.

  2. Notifications tab: Access personal and project notifications.

  3. Personal notifications view: Configure your own notification preferences.

  4. Toggle in-app messages: Turn on/off in-app notifications for underutilization.

  5. Toggle email notifications: Turn on/off email notifications for underutilization.

  6. Change scope: Choose whether to receive notifications for your executions or all executions in the project.

Project notifications in the project settings

Project notifications are shared with all users, unlike personal notifications.

💡 Setting up project notifications requires configuring a channel as the receiver. This can be done under the Channels tab in the project notifications settings.

What to do when you get an alert

For GPU underutilization

  • Check batch size: Increase batch size to better utilize GPU memory and processor.

  • Switch to smaller instance: If utilization is consistently low, use a cheaper GPU type.

  • Enable mixed precision: Use automatic mixed precision training to increase throughput.

For CPU underutilization

  • Reduce CPU allocation: Switch to an instance with fewer cores.

  • Increase data loader workers: If GPU is waiting on CPU for data loading, add more workers.

For memory underutilization

  • Switch to instance with less RAM: Save costs by using an instance with appropriate memory.

  • Increase batch size: If GPU memory allows, increase batch size to better utilize system memory.

What's next

Visualize utilization to understand detailed resource usage charts and identify specific bottlenecks.

Hardware statistics overview to learn what metrics Valohai tracks and how data is collected.

Last updated

Was this helpful?