# Import Existing Cloud Files

Import files that already exist in your cloud storage into Valohai's data catalog. This creates `datum://` links for external files without moving or re-uploading data.

***

### When to Use This

Import existing cloud files when you need to:

* **Track legacy data** — Bring pre-existing datasets into Valohai's lineage system
* **Use external datasets** — Access data uploaded by other teams or processes
* **Avoid re-uploading** — Create datum links for large files already in your cloud storage
* **Migrate to Valohai** — Import historical data when adopting Valohai for existing projects
* **Share team data** — Make files uploaded directly to cloud storage discoverable in Valohai

***

### What This Does

**"Adopting" files** creates Valohai datum records pointing to your existing cloud storage files.

> 💡 **Important:** Files remain in your cloud storage. Nothing is copied or moved. Valohai creates tracking metadata so you can use these files like any other Valohai data.

**After importing, you can:**

* Use files as inputs in executions
* View and search files in the Valohai UI
* Add tags, aliases, and properties
* Track lineage and usage
* Create datasets from imported files

***

### Requirements

Before importing files:

#### Files Must Be Accessible

The files must exist in a data store that's already configured in your Valohai project. Valohai uses the store's credentials to verify file existence.

**Learn more:** [Configure Data Stores](/data/configure-data-stores.md)

#### You Need Project Permissions

You must have permission to add data to the project.

#### Files Must Use Correct URL Format

Use your cloud provider's native URL format:

* AWS S3: `s3://bucket-name/path/to/file.ext`
* Google Cloud Storage: `gs://bucket-name/path/to/file.ext`
* Azure Blob Storage: `azure://account/container/path/to/file.ext`
* OpenStack Swift: `swift://project/container/path/to/file.ext`

***

### Import via Web UI

1. Open your project
2. Navigate to the **Data** tab
3. Click the **Adopt** tab
4. Select the **Destination store** from the dropdown menu
5. Enter file URLs to import (one per line)
6. Click **Adopt selected files**

Valohai verifies each file exists and creates datum records.

> ⚠️ **Performance warning:** Importing many files (100+) via the web UI can be slow and may be interrupted by network issues or browser timeouts. For bulk imports, use the API with retry logic instead.

**Example input:**

```
s3://my-bucket/datasets/training-data-v1.csv
s3://my-bucket/datasets/training-data-v2.csv
s3://my-bucket/models/baseline-model.pkl
```

***

### Import via API

Use the Valohai API for bulk imports or automated workflows.

#### Prerequisites

1. **Get your API token** from Valohai account settings
2. **Get your project ID** from Project → Settings in the Valohai UI
3. **Get your data store ID:**

```python
import os
import requests

response = requests.get(
    "https://app.valohai.com/api/v0/stores/",
    headers={"Authorization": "Token " + os.getenv("VH_TOKEN")},
)

stores = response.json()
# Find your store ID in the response
```

#### Import Single File

```python
import os
import requests

store_id = "your-store-id-here"
project_id = "your-project-id-here"

payload = {
    "url": "s3://my-bucket/datasets/training-data.csv",
    "root_path": "s3://my-bucket/",
    "project": project_id,
}

response = requests.post(
    f"https://app.valohai.com/api/v0/stores/{store_id}/adopt/",
    json=payload,
    headers={
        "Authorization": "Token " + os.getenv("VH_TOKEN"),
        "Content-Type": "application/json",
    },
)

# Handle response
if response.ok:
    result = response.json()
    if result.get("ok"):
        print(f"Success! Datum ID: {result['created']}")
    else:
        print(f"Error: {result['message']}")
else:
    print(f"HTTP Error: {response.status_code}")
    print(response.text)
```

#### Import Multiple Files with Error Handling

```python
import os, requests, time

token = os.getenv("VH_TOKEN")

store_id = "your-store-id-here"
project_id = "your-project-id-here"
files_to_import = [
    "s3://my-bucket/datasets/train.csv",
    "s3://my-bucket/datasets/validation.csv",
    "s3://my-bucket/models/baseline.pkl",
]
root_path = "s3://my-bucket/"


def adopt_files(urls, root_path, store_id, project_id, token: str, retries: int = 3):
    headers = {"Authorization": f"Token {token}"}

    payload = {
        "urls": urls,
        "root_path": root_path,
        "project": project_id,
    }

    url = f"https://app.valohai.com/api/v0/stores/{store_id}/adopt/"

    for attempt in range(retries):
        try:
            res = requests.post(url, json=payload, headers=headers, timeout=30)

            result = res.json()
            if res.ok and "ok" in result:
                return {"success": True, "datums": list(result["created"].values())}
            else:
                return {"success": False, **(result[0])}
        except requests.exceptions.RequestException as e:
            if attempt < retries - 1:
                wait_time = 2**attempt  # Exponential backoff
                print(f"Retry {attempt + 1}/{retries} for {url} after {wait_time}s...")
                time.sleep(wait_time)
            else:
                return {"success": False, "error": f"Request failed: {str(e)}", "code": -1}


adopt_result = adopt_files(
    urls=files_to_import,
    root_path=root_path,
    store_id=store_id,
    project_id=project_id,
    token=token,
)

if adopt_result["success"]:
    print(f"  ✓ Success: {adopt_result['datums']}")
else:
    print(f"  ✗ Error: {adopt_result['error']}")
```

***

#### ROOT\_PATH

This parameter, even though similar to `Upload path` used with [manual files upload](/data/data-versioning/upload-files-via-web-ui.md), works in an opposite way.\
`root_path` is an optional, common path, which will be deducted from all URLs in order to produce the path on which the files will be imported.

\
Example:

{% code overflow="wrap" %}

```
urls = [
    "s3://some-bucket/dir_1/dir_2/file_1.txt",
    "s3://some-bucket/dir_1/dir_2/file_2.txt",
    "s3://some-bucket/dir_1/dir_3/file_2.txt"
]

┌─ root_path = s3://some-bucket
│  └─→ dir_1/dir_2/file_1.txt
│  └─→ dir_1/dir_2/file_2.txt
│  └─→ dir_1/dir_3/file_2.txt
│
┌─ root_path = s3://some-bucket/dir_1
│  └─→ dir_2/file_1.txt
│  └─→ dir_2/file_2.txt
│  └─→ dir_3/file_2.txt
│
┌─ root_path = s3://some-bucket/dir_1/dir_2
│  └─→ Error: {'success': False, 'message': 'All adopted URLs must start with the given root path', 'code': 'invalid_root_path'}


```

{% endcode %}

***

### API Responses

#### Success

File imported successfully:

```json
{
  "ok": true,
  "created": {
    "s3://my-bucket/my-file.txt": "017a515f-30a4-d0f1-d37a-53ffc38e90c7"
  }
}
```

**What to do:** Save the datum ID (`017a515f-...`) to use as `datum://017a515f-...` in your pipelines.

***

#### Already Exists

File was previously imported:

```json
{
  "message": "s3://my-bucket/my-file.txt already exists in my-bucket",
  "code": "adopt_already_exists"
}
```

**What to do:**

* The file is already tracked in Valohai
* Find it in the Data → Browse tab
* No action needed unless you want to update metadata

***

#### Not Found

File doesn't exist in cloud storage:

```json
{
  "message": "Not found in my-bucket: 's3://my-bucket/my-file.txt'",
  "code": "adoptable_file_not_found"
}
```

**What to do:**

* Verify the file URL is correct (check spelling, path, bucket name)
* Ensure the file exists in your cloud storage
* Check that Valohai has access to the bucket/container
* Verify you selected the correct data store

***

### Common Issues & Fixes

#### File Not Found Error

**Symptom:** `adoptable_file_not_found` error during import

**Causes & Fixes:**

* Typo in file URL → Double-check bucket name, path, and filename (case-sensitive)
* File doesn't exist → Verify file exists in your cloud storage console
* Wrong data store selected → Ensure you selected the correct destination store
* Wrong cloud region → Check that the data store is configured for the correct region
* File in different bucket → Verify the bucket name matches your data store configuration

***

#### Permission Denied

**Symptom:** Import fails with access or permission error

**Causes & Fixes:**

* Data store credentials invalid → Verify data store configuration and credentials
* Bucket policy blocks access → Check cloud storage IAM/permissions allow Valohai to read files
* File is private/encrypted → Ensure Valohai's service account has read access
* Cross-region access issues → Verify data store configuration matches file location

***

#### Already Exists

**Symptom:** File shows as already existing in Valohai

**Causes & Fixes:**

* File was previously imported → Find it in Data → Browse tab (not an error)
* Trying to import duplicate → Use the existing datum ID instead of re-importing

**To find existing datum:**

1. Go to **Data → Browse**
2. Search by filename
3. Copy the datum ID

***

#### Bulk Import Interrupted

**Symptom:** Web UI import stops partway through large file list

**Causes & Fixes:**

* Browser timeout → Use API with retry logic for bulk imports (>50 files)
* Network interruption → Import in smaller batches via UI
* Too many files → Use the API script with error handling (see above)

***

### Best Practices

#### Organize Before Importing

Plan your import strategy:

* Group related files
* Use consistent naming
* Document file sources
* Tag immediately after import

#### Use API for Bulk Operations

* Web UI: Good for <50 files
* API: Required for 50+ files

#### Add Metadata Immediately

After importing, add context:

```python
metadata = {
    "valohai.tags": ["imported-2024-01", "legacy-data"],
    "valohai.alias": "training-data-v1",
    "source": "s3-legacy-bucket",
    "import_date": "2024-01-15",
    "original_owner": "data-team",
}
```

#### Create Aliases for Key Files

Make frequently-used imports easy to reference:

```
legacy-training-data → datum://abc123...
baseline-model → datum://def456...
validation-set-fixed → datum://ghi789...
```

#### Verify After Import

Check that files are accessible:

1. Find imported file in Data → Browse
2. Create a test execution using the file as input
3. Verify file downloads and opens correctly

***

### Related Pages

* [Load Data in Jobs](/data/data-versioning/load-files-in-jobs.md) — Use imported files as execution inputs
* [Add Context to Your Data Files](/data/data-versioning/metadata-overview.md) — Tag and organize imported files
* [Configure Data Stores](/data/configure-data-stores.md) — Set up cloud storage access
* [Upload Files via Web UI](/data/data-versioning/upload-files-via-web-ui.md) — Alternative for small files

***

### Next Steps

* Import a test file and verify it appears in the Data tab
* Add [tags and aliases](/data/data-versioning/metadata-overview.md) to your imported files
* [Create an execution](/executions.md) using an imported file [as input](/data/data-versioning/load-files-in-jobs.md)
* Set up automated imports using the [API ](/automation-overview/rest-api.md)for new files


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/data/data-versioning/add-existing-files.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
