The Valohai Ecosystem database connectors allow you to easily run database queries on AWS Redshift, GCP BigQuery and Snowflake.
You can define the query in the Valohai user interface and it will be saved with other details related to the execution. The query results will be saved as execution outputs and will be automatically uploaded to your data store to be used in other jobs you run in Valohai. Like any other execution output, the result file will get a datum URL, which allows you to connect a datum alias to it (e.g. “datum://latest-model”)
You can authenticate to the database using environment variables in Valohai. For Redshift and BigQuery it is also possible to use machine identity (IAM roles) to authenticate the database connection.
Below, you’ll find instructions how to connect to your database on Snowflake.
Requirements
- Organisation and account in Snowflake.
- Existing database with data.
- Make sure your firewall settings allow Valohai workers to access the database.
- User with permissions to query the database.
Add environment variables
You will need to define the following environment variables for the execution:
SNOWSQL_ACCOUNT
, your Snowflake account information as -.SNOWSQL_WAREHOUSE
, the name of your Snowflake warehouse.SNOWSQL_DATABASE
, the name of your Snowflake database.SNOWSQL_SCHEMA
, the name of the schema (optional, default value PUBLIC).SNOWSQL_USER
, Snowflake username.SNOWSQL_PWD
, Snowflake password.
Make sure you save this as a secret by clicking on the checkbox in the Valohai user interface.
It is possible to add the environment variables separately to each execution while creating it in the user interface but we recommend saving them either under the project Settings or as an environment variable group on organization level (ask your organization admin for help with this).
If needed, you can edit the environment variables in the user interface before starting the execution.
Create an execution
In order to to create and execution with the Snowflake connector, follow the instructions below.
- Open your project
- Click on the Create Execution button under the Executions tab.
- Expand the step library by clicking on the plus sign next to valohai-ecosystem in the left side menu.
- Choose snowflake-query step under the Connectors.
- (Optional) Change the settings, such as the environment or Docker image, under the runtime section if needed.
- Write your SQL query into the field under Parameters sections.
- By default the query results will be saved as
results.csv
but you can also define some other path for the output. - (Optional) You can give the output file a datum alias, e.g. redshift-query, to easily refer to it in your other executions with the input
datum://redshift-query
.
- By default the query results will be saved as
- If you did not add save the environment variables under project Settings or as an environment variable group on organization level, add them under the Environment Variables section.
- You can edit and/or add new environment variables if needed.
- Click on Create Execution.