Snowflake
Query Snowflake from Valohai executions and save snapshots for reproducible ML pipelines.
Overview
Snowflake is a cloud data warehouse that you can query directly from Valohai executions. This guide shows you how to:
Store Snowflake credentials securely
Query Snowflake from your code
Save snapshots for reproducibility
Use Snowflake Time Travel features
Prerequisites
Before you begin:
Existing Snowflake account with a database containing your data
Snowflake credentials (username, password, account identifier)
Database access for the user account
Store Credentials in Valohai
Authenticate to Snowflake using username, password, and account identifier stored as environment variables.
Step 1: Find Your Snowflake Account Identifier
Your account identifier format depends on your Snowflake deployment:
Format examples:
xy12345.us-east-1(AWS)xy12345.us-central1.gcp(GCP)xy12345.east-us-2.azure(Azure)
Find it in your Snowflake URL: https://<account_identifier>.snowflakecomputing.com
Step 2: Add Environment Variables
Open your project in Valohai
Go to Settings → Env Variables
Add the following variables:
SNOWFLAKE_USER
Your Snowflake username
No
SNOWFLAKE_PASSWORD
Your Snowflake password
Yes
SNOWFLAKE_ACCOUNT
Account identifier (e.g., xy12345.us-east-1)
No
SNOWFLAKE_WAREHOUSE
Warehouse name (e.g., COMPUTE_WH)
No
SNOWFLAKE_DATABASE
Database name
No
SNOWFLAKE_SCHEMA
Schema name (e.g., PUBLIC)
No
💡 Environment Variable Groups: Organization admins can create shared credential groups under Organization Settings → Environment Variable Groups instead of configuring each project separately.
Install Snowflake Connector
The Snowflake Python connector requires Python 3.8+. Install the connector and its dependencies in your execution.
Option 1: Install in Command (Recommended)
valohai.yaml:
Option 2: Include in Docker Image
Dockerfile:
Query Snowflake
Basic Query Example
query_snowflake.py:
Complete Workflow: Query → Snapshot → Train
Step 1: Query and Save Snapshot
fetch_data.py:
Step 2: Train on Snapshot
train.py:
Step 3: Pipeline Configuration
valohai.yaml:
Maintaining Reproducibility
⚠️ Critical: Snowflake data changes continuously. Query results today differ from results tomorrow.
The problem:
The solution:
Best practices:
Query once — Run query in dedicated execution
Snapshot immediately — Save to
/valohai/outputs/Version snapshots — Create dataset versions
Train on snapshots — Use dataset as input, never query directly in training
Use Time Travel for debugging — But snapshot for reproducibility
See: Databases for complete reproducibility patterns.
Common Issues & Fixes
Connection Failed
Symptom: snowflake.connector.errors.DatabaseError: 250001
Causes & Fixes:
Wrong account identifier → Verify format (include region and cloud)
Wrong username/password → Check credentials in Snowflake UI
Network connectivity → Check firewall/VPN settings
Warehouse Not Running
Symptom: SQL compilation error: Object does not exist
Causes & Fixes:
Warehouse suspended → Resume warehouse:
ALTER WAREHOUSE COMPUTE_WH RESUMEWrong warehouse name → Verify warehouse exists:
SHOW WAREHOUSES
Insufficient Privileges
Symptom: SQL access control error: Insufficient privileges
Causes & Fixes:
User missing permissions → Grant necessary roles:
GRANT SELECT ON DATABASE analytics TO ROLE ml_roleWrong role active → Use correct role:
cursor.execute("USE ROLE ml_role")
Python Version Incompatibility
Symptom: Import errors or dependency conflicts
Causes & Fixes:
Wrong requirements file → Use correct version for your Python (e.g.,
requirements_39.reqsfor Python 3.9)Missing dependencies → Install tested requirements before connector
Related Pages
Databases — Overview and reproducibility patterns
AWS Redshift — Connect to Amazon Redshift
BigQuery — Connect to Google BigQuery
Create and Manage Datasets — Version your snapshots
Next Steps
Store Snowflake credentials in Valohai
Create a test query execution
Save your first snapshot as a dataset version
Last updated
Was this helpful?
