Observability & Analytics

Understand how Valohai provides visibility into your ML operations through dashboards, audit logs, and resource monitoring.

Valohai's observability tools give you complete visibility into your ML operations, from organization-wide performance metrics to individual job resource utilization.

Unlike fragmented toolchains that separate metrics across experiment trackers, infrastructure monitors, and cloud cost dashboards, Valohai provides you an unified view of your ML operations.

Why observability matters

Modern ML operations face three critical challenges:

  • Demonstrating platform value. Stakeholders need quantified ROI, not feature lists. How much money and time are you saving? How fast are teams shipping models?

  • Maintaining governance and compliance. With regulations like the EU AI Act, organizations must prove who did what, when, and why across their AI development pipelines.

  • Optimizing resource efficiency. GPU costs add up fast. Are your teams actually using the compute they're requesting? Where are the bottlenecks?

Valohai's observability tools addresses all three.

Three layers of visibility

Productivity Dashboard

Track organization-wide ML performance with metrics that matter for business decisions.

The Productivity Dashboard quantifies platform ROI by showing concrete savings from job reuse, time-to-value improvements, and pipeline reliability. It transforms operational data into evidence that demonstrates why unified MLOps delivers better outcomes than DIY toolchains.

Best for: VPs, CTOs, ML Managers, Technical Decision Makers

Learn more: Productivity Dashboard

Audit Log

Maintain an immutable record of all actions across your ML projects for compliance, debugging, and accountability.

Audit logs provide comprehensive traceability for AI governance requirements. Every execution, data access, model approval, and pipeline run is logged with full context about who, what, when, and where. These logs cannot be modified or deleted, ensuring you always have a verifiable history for compliance audits.

Best for: Organization Administrators, Compliance Teams, Platform Engineers

Learn more: Audit Log

Resource Monitoring

Track hardware utilization in real-time to optimize costs and identify underutilized resources.

Valohai automatically monitors CPU, memory, GPU processor, and GPU memory usage for every execution. Real-time visualizations help data scientists right-size their compute requests, while automated alerts flag underutilized resources that could be running on cheaper instances.

Best for: Data Scientists, ML Engineers, Platform Engineers

Learn more: Resource Monitoring

How these tools work together

These three observability layers complement each other:

  • For data scientists running experiments: Resource monitoring shows if you're using the GPU capacity you requested. If utilization is consistently low, switch to a cheaper instance type.

  • For team leads managing projects: The Productivity Dashboard reveals which projects consume the most compute budget and whether your team is benefiting from job reuse. Audit logs help debug production issues by showing exactly what changed.

  • For executives justifying platform investment: The Productivity Dashboard quantifies cost savings, time-to-value improvements, and operational reliability. Audit logs demonstrate governance maturity for compliance discussions.

Built-in, not bolted-on

All observability features work out of the box with zero configuration. Valohai automatically captures metrics, logs events, and tracks resource usage as part of normal platform operation.

You don't need to instrument your code, configure exporters, or build custom dashboards. The visibility is already there.

Last updated

Was this helpful?