Restrict Data Stores

Data stores connect Valohai to cloud object storage (AWS S3, Azure Blob Storage, GCP Storage, MinIO). You can share data stores across your organization or restrict them to specific teams.

This lets you separate sensitive production data from development datasets, or give different teams access to different cloud accounts.

Data Store Scopes

Organization-level: Defined in organization settings. Can be shared with everyone or limited to specific teams.

Project-level: Defined in project settings. Only accessible within that project.

This guide covers organization-level data stores. For project-specific stores, configure them in your project's data store settings.

Share Data Store with Specific Teams

  1. Click Hi, <username> in the top-right corner

  2. Select Manage <organization>

  3. Open the Data Stores tab

  4. Find the data store you want to configure

  5. Click the ... menu at the end of the row and select Edit.

  6. Choose sharing option:

    • Share with specific teams: Select teams from the dropdown

    • Share with everyone: Make accessible to all organization members

  7. Click Save

Sharing Options

Share with Everyone

Effect: All users in the organization can read and write to this data store in their executions.

Use cases:

  • Shared datasets for exploration (ImageNet, COCO)

  • Organization-wide artifact storage

  • Public reference data

Example: A data store containing public datasets that all teams use for benchmarking.

Share with Specific Teams

Effect: Only users in the selected teams can access this data store.

Use cases:

  • Production data limited to production team

  • Sensitive customer data restricted to specific authorized teams

  • Department-specific cloud accounts

Example: Production S3 bucket accessible only to "ml-production" and "ops" teams.

No Sharing (Project-Level)

Effect: Define data stores at the project level instead of organization level. Only project members can access them.

Use case: Isolate data access to a single project, regardless of team membership.

Setup: Go to project SettingsData Stores instead of organization settings.

Common Data Store Patterns

Separate Production and Development

Production data store:

Name: production-s3
Bucket: s3://company-ml-production
Teams: ml-production, ops

Development data store:

Name: staging-s3
Bucket: s3://company-ml-staging
Teams: ml-research, ml-engineering, data-science

Production team can't accidentally use staging data, and developers can't access production customer data.

Department-Based Cloud Accounts

Research team cloud account:

Name: research-gcs
Bucket: gs://research-experiments
Teams: ml-research

Analytics team cloud account:

Name: analytics-gcs
Bucket: gs://analytics-datasets
Teams: analytics, data-engineering

Each department has separate GCP projects and billing, managed through team-restricted data stores.

Compliance-Driven Access

HIPAA-compliant data:

Name: healthcare-data
Bucket: s3://healthcare-phi-data
Teams: healthcare-ml (members have HIPAA training)

Public datasets:

Name: public-datasets
Bucket: s3://public-ml-datasets
Teams: (everyone)

Restrict sensitive data to trained team members while keeping public data widely available.

Project-Level Data Stores

For even tighter access control, define data stores at the project level:

  1. Open your project

  2. Go to SettingsData Stores

  3. Click Add Data Store

  4. Configure cloud credentials

  5. Save

Advantages:

  • Access limited to project members only (regardless of team)

  • Credentials scoped to single project

  • Useful for client-specific projects or consulting work

Disadvantages:

  • Can't share across projects

  • Must configure separately for each project

Authentication and Credentials

Data stores require cloud credentials to access storage:

AWS S3: Access Key ID and Secret Access Key, or IAM role

Azure Blob: Connection string or SAS token

GCP Storage: Service account JSON key

MinIO: Access Key and Secret Key

Security: Credentials are encrypted and only accessible to executions in projects with data store access.

See cloud-specific setup guides:

Troubleshooting

User Can't See Data Store

Cause: User's team doesn't have access to the data store.

Fix:

  1. Go to organization Data Stores

  2. Click ...Manage sharing

  3. Add user's team to the allowed teams list

  4. Save changes

Execution Fails with Access Denied

Cause: Data store credentials invalid or expired.

Fix:

  1. Go to organization Data Stores

  2. Click ...Edit

  3. Update cloud credentials

  4. Test connection

  5. Save changes

Wrong Team Has Access

Cause: Data store shared too broadly.

Fix:

  1. Go to Data Stores

  2. Click ...Manage sharing

  3. Change from "Everyone" to "Specific teams"

  4. Select only authorized teams

  5. Save

Last updated

Was this helpful?