# Adding README to Valohai Datasets

This guide explains how to programmatically add a README.md file to a dataset version and visualize it in the Valohai UI. This is useful for documenting dataset contents, metadata, and usage instructions.

### Overview

You can add README files at two levels:

1. **Dataset Level**: A README for the entire dataset that applies to all versions
2. **Dataset Version Level**: A README specific to a particular version

Both follow the same process:

1. Create a dataset or dataset version
2. Generate the README content (typically via an execution)
3. Save the README.md as an output file
4. Attach the README via API

### Adding a README to a Dataset

#### Step 1: Note Your Dataset ID

Identify the dataset you want to document and record its **dataset ID**. You can find this in:

* The Valohai web interface (in the dataset URL)
* The Valohai API (for example DatasetList endpoint: `/api/v0/datasets/`)

#### Step 2: Generate the README Content

Create a script (e.g., `readme.py`) that generates a README.md file for your dataset. This README typically includes:

* High-level dataset description
* Overall purpose and use cases
* Data collection methodology
* License and attribution information
* General usage guidelines

#### Step 3: Upload the README File

Run your execution to generate the README.md file. After the execution completes, note the **datum ID** of the readme.md output file.

> :bulb: You can always create the README on your local machine, [upload it to your data store](https://docs.valohai.com/data/data-versioning/upload-files-via-web-ui) or use [datum adoption](https://docs.valohai.com/data/data-versioning/add-existing-files) to give it a datum id.

#### Step 4: Attach README to Dataset

Use the Valohai API to link the README file to your dataset:

```bash
curl 'https://app.valohai.com/api/v0/datasets/{dataset-id}/readme/' \
  --request POST \
  -H 'Authorization: Token <TOKEN>' \
  -H 'Content-Type: application/json;charset=UTF-8' \
  --data-binary '{"datum":"{datum-id}"}'
```

Replace the following placeholders:

* `{dataset-id}`: The dataset ID from Step 1
* `<TOKEN>`: Your Valohai API token
* `{datum-id}`: The datum ID of the readme.md file from Step 3

<figure><img src="https://4109720758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ff3mjTRQNkASbnMbJqzJ2%2Fuploads%2Fgit-blob-800f1505387aa8ca0f45fb13339f67a101f29301%2Fimage.png?alt=media" alt=""><figcaption></figcaption></figure>

### Adding a README to a Dataset Version

#### Step 1: Create a Dataset Version

First, create a new dataset version in Valohai and note its ID. You can do this through:

* [The Valohai web interface](https://docs.valohai.com/data/creating-datasets#create-via-web-ui)
* [Programmatically within and execution](https://docs.valohai.com/data/creating-datasets#create-via-code-recommended)
* [The Valohai API](https://docs.valohai.com/automation-overview/rest-api)

Make sure to record the **dataset version ID** as you'll need it in Step 4.

#### Step 2: Generate the README Content

Create a script (e.g., `readme.py`) that analyzes your data and generates a README.md file. This script can:

* Summarize dataset statistics
* Document data schema and structure
* Include data quality metrics
* Provide usage examples

#### Example Workflow

The README generation is typically executed as part of a pipeline that:

* Triggers automatically when a new dataset version is created
* Runs with the latest dataset version as input
* Produces a README.md as output

Your `readme.py` might look something like:

```python
import valohai

# Analyze dataset
# ... your analysis code ...

# Generate README content
readme_content = """
# Dataset Documentation

## Overview
Description of the dataset...

## Statistics
- Total records: X
- Features: Y
- Date range: Z

## Usage
Instructions for using this dataset...
"""

# Save as output
output_path = valohai.outputs().path('readme.md')
with open(output_path, 'w') as f:
    f.write(readme_content)
```

#### Step 3: Upload the README File

After your execution completes, the README.md file will be saved as an execution output. Note the **datum ID** of this output file. You can find this:

* In the execution's outputs tab in the web interface
* Via the API by querying the execution's outputs

> :bulb: :bulb: You can always create the README on your local machine, [upload it to your data store](https://docs.valohai.com/data/data-versioning/upload-files-via-web-ui) or use [datum adoption](https://docs.valohai.com/data/data-versioning/add-existing-files) to give it a datum id.

#### Step 4: Attach README to Dataset Version

Use the Valohai API to link the README file to your dataset version.

#### API Call

```bash
curl 'https://app.valohai.com/api/v0/dataset-versions/<version-id>/readme/' \
  --request POST \
  -H 'Authorization: Token <TOKEN>' \
  -H 'Content-Type: application/json;charset=UTF-8' \
  --data-binary '{"datum":"<datum-id>"}'
```

Replace the following placeholders:

* `<version-id>`: The dataset version ID from Step 1
* `<TOKEN>`: Your Valohai API token
* `<datum-id>`: The datum ID of the readme.md file from Step 3

<figure><img src="https://4109720758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ff3mjTRQNkASbnMbJqzJ2%2Fuploads%2Fgit-blob-804ffe94c81d621fb027d86d496ecc76b401e0f6%2Fimage.png?alt=media" alt=""><figcaption></figcaption></figure>

### For Self-Hosted Installations

If you're using a self-hosted Valohai instance, adjust the URL accordingly:

```bash
curl 'http://your-valohai-instance.local/api/v0/dataset-versions/<version-id>/readme/' \
  --request POST \
  -H 'Authorization: Token <TOKEN>' \
  -H 'Content-Type: application/json;charset=UTF-8' \
  --data-binary '{"datum":"<datum-id>"}'
```

### Verification

After making the API call:

1. Refresh the dataset or dataset version page in the Valohai web interface
2. The README should now be visible on the details page
3. The content will be rendered in Markdown format

### Complete Example

Here's a summary of the workflow with example IDs:

**For a dataset:**

```
Dataset ID: a3f7b2c1-4d5e-6f7a-8b9c-0d1e2f3a4b5c
↓
Run readme.py execution
↓
README.md Output Datum ID: 7c8d9e0f-1a2b-3c4d-5e6f-7a8b9c0d1e2f
↓
POST to /api/v0/datasets/a3f7b2c1-4d5e-6f7a-8b9c-0d1e2f3a4b5c/readme/
```

**For a dataset version:**

```
Dataset Version ID: f1e2d3c4-b5a6-9876-5432-10fedcba9876
↓
Run readme.py execution with this dataset version
↓
README.md Output Datum ID: 2b4c6d8e-0f1a-2b3c-4d5e-6f7a8b9c0d1e
↓
POST to /api/v0/dataset-versions/f1e2d3c4-b5a6-9876-5432-10fedcba9876/readme/
```

### When to Use Each Level

**Dataset-level README**: Use this for information that applies to all versions, such as:

* Overall dataset purpose and scope
* Data collection methodology
* License and attribution
* General data schema (if consistent across versions)

**Version-level README**: Use this for version-specific information, such as:

* Version-specific statistics and metrics
* Changes from previous versions
* Known issues in this particular version
* Version-specific data quality notes

### Automation Tips

To fully automate this workflow:

* Use a pipeline that triggers on new dataset versions
* Have your README generation step output both the readme.md file and a metadata file containing the relevant IDs
* Create a final pipeline step that calls the API automatically using the IDs from previous steps
* Store your API token as a Valohai [environment variable](https://docs.valohai.com/user-and-organization-management/getting-started/environment-variables) for secure access

### Troubleshooting

**README not appearing**: Ensure you've refreshed the page and that the API call returned a successful response (HTTP 200).

**Authentication errors**: Verify your API token is valid and has the necessary permissions for the project.

**Wrong datum ID**: Double-check that you're using the datum ID of the readme.md file specifically, not the execution ID or another output file.
