Adding README to Valohai Datasets

This guide explains how to programmatically add a README.md file to a dataset version and visualize it in the Valohai UI. This is useful for documenting dataset contents, metadata, and usage instructions.

Overview

You can add README files at two levels:

  1. Dataset Level: A README for the entire dataset that applies to all versions

  2. Dataset Version Level: A README specific to a particular version

Both follow the same process:

  1. Create a dataset or dataset version

  2. Generate the README content (typically via an execution)

  3. Save the README.md as an output file

  4. Attach the README via API

Adding a README to a Dataset

Step 1: Note Your Dataset ID

Identify the dataset you want to document and record its dataset ID. You can find this in:

  • The Valohai web interface (in the dataset URL)

  • The Valohai API (for example DatasetList endpoint: /api/v0/datasets/)

Step 2: Generate the README Content

Create a script (e.g., readme.py) that generates a README.md file for your dataset. This README typically includes:

  • High-level dataset description

  • Overall purpose and use cases

  • Data collection methodology

  • License and attribution information

  • General usage guidelines

Step 3: Upload the README File

Run your execution to generate the README.md file. After the execution completes, note the datum ID of the readme.md output file.

💡 You can always create the README on your local machine, upload it to your data store or use datum adoption to give it a datum id.

Step 4: Attach README to Dataset

Use the Valohai API to link the README file to your dataset:

Replace the following placeholders:

  • {dataset-id}: The dataset ID from Step 1

  • <TOKEN>: Your Valohai API token

  • {datum-id}: The datum ID of the readme.md file from Step 3

Adding a README to a Dataset Version

Step 1: Create a Dataset Version

First, create a new dataset version in Valohai and note its ID. You can do this through:

Make sure to record the dataset version ID as you'll need it in Step 4.

Step 2: Generate the README Content

Create a script (e.g., readme.py) that analyzes your data and generates a README.md file. This script can:

  • Summarize dataset statistics

  • Document data schema and structure

  • Include data quality metrics

  • Provide usage examples

Example Workflow

The README generation is typically executed as part of a pipeline that:

  • Triggers automatically when a new dataset version is created

  • Runs with the latest dataset version as input

  • Produces a README.md as output

Your readme.py might look something like:

Step 3: Upload the README File

After your execution completes, the README.md file will be saved as an execution output. Note the datum ID of this output file. You can find this:

  • In the execution's outputs tab in the web interface

  • Via the API by querying the execution's outputs

💡 💡 You can always create the README on your local machine, upload it to your data store or use datum adoption to give it a datum id.

Step 4: Attach README to Dataset Version

Use the Valohai API to link the README file to your dataset version.

API Call

Replace the following placeholders:

  • <version-id>: The dataset version ID from Step 1

  • <TOKEN>: Your Valohai API token

  • <datum-id>: The datum ID of the readme.md file from Step 3

For Self-Hosted Installations

If you're using a self-hosted Valohai instance, adjust the URL accordingly:

Verification

After making the API call:

  1. Refresh the dataset or dataset version page in the Valohai web interface

  2. The README should now be visible on the details page

  3. The content will be rendered in Markdown format

Complete Example

Here's a summary of the workflow with example IDs:

For a dataset:

For a dataset version:

When to Use Each Level

Dataset-level README: Use this for information that applies to all versions, such as:

  • Overall dataset purpose and scope

  • Data collection methodology

  • License and attribution

  • General data schema (if consistent across versions)

Version-level README: Use this for version-specific information, such as:

  • Version-specific statistics and metrics

  • Changes from previous versions

  • Known issues in this particular version

  • Version-specific data quality notes

Automation Tips

To fully automate this workflow:

  • Use a pipeline that triggers on new dataset versions

  • Have your README generation step output both the readme.md file and a metadata file containing the relevant IDs

  • Create a final pipeline step that calls the API automatically using the IDs from previous steps

  • Store your API token as a Valohai environment variable for secure access

Troubleshooting

README not appearing: Ensure you've refreshed the page and that the API call returned a successful response (HTTP 200).

Authentication errors: Verify your API token is valid and has the necessary permissions for the project.

Wrong datum ID: Double-check that you're using the datum ID of the readme.md file specifically, not the execution ID or another output file.

Last updated

Was this helpful?