Before you begin, make sure you have an existing Snowflake database
To access Snowflake from Valohai executions, follow these steps:
Store Your Snowflake Credentials in Valohai
Authenticate to your Snowflake database using a username, password, and account identifier.
Set up the connection details as environment variables:
- Open your project in the Valohai web app
- Go to project Settings and open the Env Variables tab.
- Create the necessary environment variables, ensuring to mark the password as secret.
|USERNAME||Your Snowflake username||No|
|PASSWORD||Your Snowflake password||Yes|
|ACCOUNT_IDENTIFIER||Your Snowflake account identifier||No|
These credentials will be available for all executions within your project, allowing authorized users to run queries.
Environment Variable Groups
An organisation admin can also define an Environment Variable Group under organisation settings. You can then inherit those values to your executions, instead of writing the connection details for each project.
To run queries on your Snowflake database, you will need to have the connector installed. You can do that via pip install in the command section of your valohai.yaml or alternatively you can add it to your custom Docker image.
The Snowflake Python connector requires Python version to be 3.7+
In addition to the connector, you should also install the corresponding dependencies, as recommended by Snowflake. These depend on the Python version you’re using.
The example below will install the dependencies recommended for Python 3.9 and then install the connector version 2.8.1 in the container.
- step: name: snowflake-connect image: python:3.9 command: - pip install -r https://raw.githubusercontent.com/snowflakedb/snowflake-connector-python/v2.8.1/tested_requirements/requirements_39.reqs - pip install snowflake-connector-python==2.8.1 - python validate.py
Connect to Snowflake and run queries in your executions
In order to connect to your Snowflake database you’ll need to import the snowflake.connector. You will also need to import the os library for handling the environment variables.
The example below creates an example database in an existing warehouse, creates an example table with data in it and finally reads the contents of the table. If you have an existing database, feel free to run queries on it instead.
import snowflake.connector import os # Create the connector by using the credentials stored in the envvars conn = snowflake.connector.connect( user=os.environ['USERNAME'], password=os.environ['PASSWORD'], account=os.environ['ACCOUNT_IDENTIFIER'] ) cs = conn.cursor() try: # Use existing warehouse cs.execute("USE WAREHOUSE tiny_warehouse_mg") # Create a new database in the warehouse cs.execute("CREATE DATABASE IF NOT EXISTS mydatabase") # Use the database (here "mydatabase") cs.execute("USE DATABASE mydatabase") # Create example_table and insert data cs.execute( "CREATE OR REPLACE TABLE " "example_table(col1 integer, col2 string)") cs.execute( "INSERT INTO example_table(col1, col2) VALUES " + " (123, 'test string1'), " + " (456, 'test string2')") # Read the table cs.execute( "SELECT * FROM example_table" ) result = cs.fetchall() print("Contents of example_table:") for x in result: print(x) finally: cs.close() conn.close()
Snowflake has their own Time Travel feature that allows you to run historical queries up to 90 days based on timestamps. After connecting to the database you can use the related AT | BEFORE clauses in your Valohai executions in a similar manner you would from your local machine. Regardless, you might still consider creating snapshots of the preprocessed datasets to make sure they are available even after the time limit for the Time Travel.