> For the complete documentation index, see [llms.txt](https://docs.valohai.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.valohai.com/data/datasets/dataset-properties.md).

# Dataset properties

Datasets and dataset versions support structured key-value properties. A similar metadata system is already available for individual [datums](/data/data-versioning/metadata-overview/custom-properties.md) and is now extended to both datasets and dataset versions. Use properties to describe your datasets with rich, structured metadata instead of working around the limitation by overloading tags or naming conventions.

***

### Datums vs. Datasets vs. Dataset Versions

Properties exist at three levels, each describing a different scope:

* **Datum properties** describe a single file (a specific image, model artifact, CSV, etc.). See [the related documentation page](/data/data-versioning/metadata-overview/custom-properties.md) for the file-level metadata system.
* **Dataset properties** describe stable attributes of a dataset that hold across all of its versions.
* **Dataset version properties** describe metadata specific to one version of a dataset.

A property is a key-value pair where the value can be a string, number, boolean, or nested JSON object — the same shape as datum metadata.

***

### When to Use Dataset vs. Dataset Version Properties

Use **dataset properties** for attributes that don't change between versions: the subject domain, the source equipment, the species being studied, the project owner. If you'd answer the same way regardless of which version someone is looking at, it belongs on the dataset.

```
{
  "domain": "manufacturing",
  "machine": "press-04",
  "material": "stainless-steel-304",
  "project_owner": "alice@example.com"
}
```

Use **dataset version properties** for attributes that are specific to one snapshot: when the data was captured, how many samples it contains, which preprocessing pipeline produced it, what the validation results were.

```
{
  "capture_date": "2026-01-15",
  "sample_count": 4700,
  "preprocessing_version": "3.2",
  "shift": "night",
  "defect_rate": 0.002
}
```

***

### How to Add Properties

Similarly to the datum properties, it is possible to add them to datasets and dataset versions programmatically during executions or via the Valohai API. In addition, it is also possible to manage the properties in the web UI. Editing properties is possible via the API and in the web UI.&#x20;

> 💡 Some properties are managed automatically by the platform and use the `valohai.` prefix. For example, `valohai.last-used` records when a dataset was last referenced which is useful for identifying stale or unused datasets.

#### In the Web Application

Open any dataset or dataset version detail page to see its **Properties** section. You can add, edit, and remove properties directly in the UI by clicking on the **Edit properties** button. Each entry is a key-value pair, and values accept strings, numbers, or booleans. Additionally, you can also edit the entries as JSON.

<figure><img src="/files/GoiIHbUJeCrbyoS6tyAm" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/PseHOFy0G1qqSP108zGe" alt=""><figcaption></figcaption></figure>

#### During Execution

Properties for dataset versions can be set from your execution code through the `valohai.metadata.jsonl` sidecar file, the same mechanism used for datum metadata. Properties written this way are staged during the execution and applied automatically when the dataset version finishes building, so metadata isn't lost if the version completes while a write is still in progress.

<pre><code><strong># Define the properties for dataset
</strong><strong># Make sure to replace &#x3C;dataset-name> with the actual name
</strong><strong>dataset_metadata = {
</strong>    "dataset://&#x3C;dataset-name>": {
      "domain": "manufacturing",
      "machine": "press-04",
      "material": "stainless-steel-304",
      "project_owner": "alice@example.com"
    }
}

# Define the properties for dataset version
# Make sure to replace &#x3C;dataset-name> and &#x3C;dataset-version-name> with the actual names
dataset_version_metadata = {
    "dataset://&#x3C;dataset-name>/&#x3C;dataset-version-name>": {
        "capture_date": "2026-01-15",
        "sample_count": 4700,
        "preprocessing_version": "3.2",
        "shift": "night",
        "defect_rate": 0.002
    }
<strong>}
</strong><strong>
</strong><strong># Add the information to the valohai.metadata.jsonl
</strong>metadata_path = "/valohai/outputs/valohai.metadata.jsonl"
with open(metadata_path, "w") as outfile:
    for dataset_name, ds_metadata in dataset_metadata.items():
        json.dump({"dataset": dataset_name, "metadata": ds_metadata}, outfile)
        outfile.write("\n")
    for dataset_version_name, dsv_metadata in dataset_version_metadata.items():
        json.dump({"dataset-version": dataset_version_name, "metadata": dsv_metadata}, outfile)
        outfile.write("\n")
</code></pre>

> 💡 For details on the sidecar file format, see [Custom Properties](https://claude.ai/data/data-versioning/metadata-overview/custom-properties.md).

#### Via the API

Use the properties API endpoints to set, update, or remove properties on existing datasets and dataset versions. This is useful for backfilling metadata, editing properties outside an execution, or integrating with external systems.

To add or edit dataset properties, send a POST request to `/api/v0/datasets/<dataset-id>/metadata/` API endpoint. The properties are defined in the payload:

```
{
  "domain": "manufacturing",
  "machine": "press-04",
  "material": "stainless-steel-304",
  "project_owner": "alice@example.com"
}
```

Similarly, to add or edit the dataset version properties, you can send a POST request to `/api/v0/dataset-versions/<dataset-version-id>/metadata/` .&#x20;

> 💡 Made a mistake and want to remove the properties? You can do this by setting the value to `null` in the payload. The following would delete all the four properties from the dataset:
>
> ```
> {
>   "domain": null,
>   "machine": null,
>   "material": null,
>   "project_owner": null
> }
> ```

***

### Filtering and Search

Because properties are structured, you can filter and search datasets by property values rather than relying on naming conventions or tag matching. This makes it practical to find all datasets matching a specific attribute, or all versions whose metadata meets a numeric threshold.

<figure><img src="/files/T4G6ky8AmK2v1GBqXjOx" alt=""><figcaption></figcaption></figure>

> 💡 Querying is not available for dataset versions at the moment. If you would need this feature discuss it with your Valohai contact.&#x20;

***

### Properties vs. Tags

Tags are flat labels — fine for simple categorization, but awkward once you need typed values, distinct keys, or filtering by specific fields. If you've been encoding metadata into dataset names or piling it into tags, properties are the structured replacement.

***

### Related Pages

* [Custom Properties](/data/data-versioning/metadata-overview/custom-properties.md) — Properties on individual datums (files)
* [Add Context to Your Data Files](/data/data-versioning/metadata-overview.md) — Overview of the metadata system
* [Organize Files with Tags](/data/data-versioning/metadata-overview/tags.md) — How to label datums with tags


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.valohai.com/data/datasets/dataset-properties.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
