# Unifying Your ML Infra

Most ML teams build Frankenstein stacks: Airflow for orchestration, MLFlow for tracking, S3 for storage, Kubernetes for compute. Each tool solves one problem well—until you need them to work together.

Valohai replaces fragmented MLOps tooling with a unified platform that handles orchestration, tracking, storage, and compute without glue code.

## The Cost of Fragmentation

When you stitch together multiple tools, you inherit their collective problems:

**Pipeline failures cascade mysteriously**

* Airflow DAGs fail without propagating context to downstream tools
* Error messages reference internal task IDs instead of ML concepts
* Debugging requires SSH access across multiple systems

**Data lineage evaporates between tools**

* Training outputs land in S3 with no metadata
* Model artifacts lose connection to their training runs
* Reproducing results means archaeology through logs

**Infrastructure becomes everyone's problem**

* Data scientists debug Kubernetes networking
* ML engineers maintain Airflow workers
* Platform teams juggle incompatible tool versions

## The Unified Alternative

Valohai connects every piece of the ML workflow through a single abstraction layer:

**Executions replace scattered jobs**

* Each run tracks inputs, outputs, logs, and metadata automatically
* Failed steps show exactly which data and parameters were used
* Re-running experiments preserves complete lineage

**Pipelines orchestrate without overhead**

* Define DAGs in YAML that version with your code
* Pass outputs between steps without manual wiring
* Monitor progress through one interface, not five dashboards

**Infrastructure adapts to workloads**

* Specify compute requirements per step (GPU type, memory, region)
* Scale from laptops to cloud clusters with the same code
* Pay only for what you use—no idle Kubernetes nodes

## When Unification Matters Most

This approach pays dividends when:

* Your team spends more time on infrastructure than ML
* Reproducing old results requires tribal knowledge
* Onboarding new team members takes weeks of tool training
* Compliance audits demand end-to-end traceability


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/readme/philosophy/unifying-fractured-infra.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
