# Troubleshoot Endpoints

Diagnose and fix issues with deployment endpoints using logs, cluster status, and local testing.

### Test locally first

Before deploying to Valohai, run your endpoint locally to catch issues early.

**Benefits:**

* Immediate feedback on code errors
* Easier debugging with local tools
* Faster iteration without waiting for builds

**Test your FastAPI endpoint:**

```shell
pip install -r requirements-deployment.txt
uvicorn predict:app --reload
```

Visit `http://localhost:8000/docs` to test your endpoints interactively.

### Check endpoint logs

The most direct way to debug runtime issues:

1. Open your deployment
2. Select the failing version
3. Click the **Log** tab

**What to look for:**

* Python stack traces
* Import errors
* Model loading failures
* Request/response errors

### Add custom logging

Enhance debugging by logging key events in your code:

```python
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


@app.post("/predict")
def predict(data: dict):
    logger.info(f"Received prediction request with {len(data)} features")

    try:
        result = model.predict(data)
        logger.info(f"Prediction successful: {result}")
        return result
    except Exception as e:
        logger.error(f"Prediction failed: {str(e)}")
        raise
```

These logs appear in the endpoint logs, making it easier to trace execution flow.

### Check cluster status

For infrastructure-level issues, check the **Cluster Status** tab:

1. Navigate to your deployment version
2. Click **Cluster Status**
3. Review pod status and events

**Common issues:**

**OOMKilled (Out of Memory)** Your endpoint consumed more memory than allocated, and Kubernetes terminated it.

**Solution:** Increase memory allocation in `valohai.yaml`:

```yaml
- endpoint:
    name: predict
    memory_limit: 2048  # Increase from default
```

**ImagePullBackOff** Kubernetes can't pull your Docker image.

**Solution:** Verify the base image exists and is accessible.

**CrashLoopBackOff** Your endpoint starts but immediately crashes.

**Solution:** Check logs for startup errors (missing files, import failures).

### Common deployment issues

**Syntax errors in Python code:**

* Test locally before deploying
* Check build logs for syntax errors during image creation

**Missing dependencies:**

* Verify all packages are in `requirements-deployment.txt`
* Pin versions to avoid surprises: `tensorflow==2.5.1`

**Model file not found:**

* Confirm the file path in `valohai.yaml` matches your code
* Check that you selected model files when creating the version

**Uvicorn not found:**

* Install it in `requirements-deployment.txt`
* Update `server-command` to use installed path: `~/.local/bin/uvicorn`

### Deployment stuck in "Pending"

A deployment stays "Pending" until successfully deployed and ready to accept requests.

**What "Pending" means:** Valohai is building your Docker image, deploying to Kubernetes, or waiting for health checks to pass. This usually takes 2-5 minutes.

**If it's stuck for more than 10 minutes, check:**

1. **Endpoint logs** for runtime errors:
   * Python syntax errors
   * Missing dependencies
   * Model loading failures
2. **Cluster Status** for infrastructure issues:
   * OOMKilled (out of memory)
   * ImagePullBackOff (can't pull Docker image)
   * CrashLoopBackOff (endpoint crashes on startup)
3. **Build logs** (if available):
   * Dependency installation failures
   * Base image not found

**Common fixes:**

* Increase memory allocation if you see OOMKilled
* Verify all packages are in `requirements-deployment.txt`
* Confirm model files aren't too large for allocated resources
* Test your endpoint locally before deploying


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/serving-your-models/real-time-endpoints/debug.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
