Before you begin, make sure you have an existing Snowflake database
To access Snowflake from Valohai executions, follow these steps:
Store Your Snowflake Credentials in Valohai
Authenticate to your Snowflake database using a username, password, and account identifier.
Set up the connection details as environment variables:
- Open your project in the Valohai web app
- Go to project Settings and open the Env Variables tab.
- Create the necessary environment variables, ensuring to mark the password as secret.
Name | Value | Secret |
---|---|---|
USERNAME | Your Snowflake username | No |
PASSWORD | Your Snowflake password | Yes |
ACCOUNT_IDENTIFIER | Your Snowflake account identifier | No |
These credentials will be available for all executions within your project, allowing authorized users to run queries.
Environment Variable Groups
An organisation admin can also define an Environment Variable Group under organisation settings. You can then inherit those values to your executions, instead of writing the connection details for each project.
Python example
To run queries on your Snowflake database, you will need to have the connector installed. You can do that via pip install in the command section of your valohai.yaml or alternatively you can add it to your custom Docker image.
The Snowflake Python connector requires Python version to be 3.7+
In addition to the connector, you should also install the corresponding dependencies, as recommended by Snowflake. These depend on the Python version you’re using.
The example below will install the dependencies recommended for Python 3.9 and then install the connector version 2.8.1 in the container.
- step:
name: snowflake-connect
image: python:3.9
command:
- pip install -r https://raw.githubusercontent.com/snowflakedb/snowflake-connector-python/v2.8.1/tested_requirements/requirements_39.reqs
- pip install snowflake-connector-python==2.8.1
- python validate.py
Connect to Snowflake and run queries in your executions
In order to connect to your Snowflake database you’ll need to import the snowflake.connector. You will also need to import the os library for handling the environment variables.
The example below creates an example database in an existing warehouse, creates an example table with data in it and finally reads the contents of the table. If you have an existing database, feel free to run queries on it instead.
import snowflake.connector
import os
# Create the connector by using the credentials stored in the envvars
conn = snowflake.connector.connect(
user=os.environ['USERNAME'],
password=os.environ['PASSWORD'],
account=os.environ['ACCOUNT_IDENTIFIER']
)
cs = conn.cursor()
try:
# Use existing warehouse
cs.execute("USE WAREHOUSE tiny_warehouse_mg")
# Create a new database in the warehouse
cs.execute("CREATE DATABASE IF NOT EXISTS mydatabase")
# Use the database (here "mydatabase")
cs.execute("USE DATABASE mydatabase")
# Create example_table and insert data
cs.execute(
"CREATE OR REPLACE TABLE "
"example_table(col1 integer, col2 string)")
cs.execute(
"INSERT INTO example_table(col1, col2) VALUES " +
" (123, 'test string1'), " +
" (456, 'test string2')")
# Read the table
cs.execute(
"SELECT * FROM example_table"
)
result = cs.fetchall()
print("Contents of example_table:")
for x in result:
print(x)
finally:
cs.close()
conn.close()
Time Travel
Snowflake has their own Time Travel feature that allows you to run historical queries up to 90 days based on timestamps. After connecting to the database you can use the related AT | BEFORE clauses in your Valohai executions in a similar manner you would from your local machine. Regardless, you might still consider creating snapshots of the preprocessed datasets to make sure they are available even after the time limit for the Time Travel.