AWS Redshift Connector
Run SQL queries on AWS Redshift and save results to your data store.
Why use this connector?
Query directly from Valohai: No need to export data manually. Write SQL, run execution, get CSV output.
Version your queries: Every query is saved with the execution. Reproduce results months later by checking which query ran when.
Feed downstream jobs: Query outputs get datum URLs. Use them as inputs in other executions or pipelines.
Requirements
Redshift cluster on your AWS account
Cluster security group allows connections from
valohai-sg-workersAuthentication via IAM role or username/password
Authentication options
Option 1: IAM role (recommended)
If your Valohai workers run on AWS with IAM roles:
Attach the policy below to
ValohaiWorkerRole(or your worker role):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GetRedshiftCredentials",
"Effect": "Allow",
"Action": "redshift:GetClusterCredentials",
"Resource": "*"
}
]
}Set environment variables:
RSCLUSTERIDENTIFIER: Redshift cluster identifierRSDATABASE: Database nameRSHOST: Cluster endpoint (e.g.,my-cluster.abc123.us-east-1.redshift.amazonaws.com)RSREGION: AWS region (e.g.,us-east-1)RSIAM: Set to1RSPORT: (Optional) Default is5439
Option 2: Username and password
If not using IAM roles:
Set environment variables:
RSCLUSTERIDENTIFIER: Redshift cluster identifierRSDATABASE: Database nameRSHOST: Cluster endpointRSREGION: AWS regionRSIAM: Set to0RSUSER: Redshift usernameRSPASSWORD: Redshift password (mark as secret)RSPORT: (Optional) Default is5439
Add environment variables
Environment variables can be added:
Project-wide: Project Settings → Environment Variables
Organization-wide: Admin users can create environment variable groups that can be passed to several projects.
Per-execution: Set when creating the execution
We recommend project or organization settings for credentials.
Run a query
Open your project
Click Create Execution
Expand valohai-ecosystem → Select
redshift-queryConfigure parameters:
query: Your SQL query
output-path: (Optional) Output filename, default is
results.csvdatum-alias: (Optional) Alias for easy reference, e.g.,
latest-orders
Verify environment variables are set
Click Create Execution
Example query
SELECT
customer_id,
product_category,
SUM(order_total) as total_spent,
COUNT(*) as order_count
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY customer_id, product_category
ORDER BY total_spent DESC
LIMIT 1000Results are saved as results.csv (or your custom output path) and uploaded to your data store.
Use query results
The output of the execution gets a datum URL. Reference it in other executions by the URL directly or by using the datum alias shown in the example below:
- step:
name: train-model
image: python:3.11
command:
- python train.py
inputs:
- name: training-data
default: datum://latest-ordersOr use it in a pipeline by passing the execution output to the next node.
Troubleshooting
Connection refused
Check:
Redshift cluster security group allows connections from
valohai-sg-workersRSHOSTincludes the full cluster endpoint (not just the identifier)RSPORTis correct (default:5439)
Authentication fails
If using IAM (RSIAM=1):
Verify
ValohaiWorkerRolehasredshift:GetClusterCredentialspermissionCheck that worker role is properly attached to your workers
If using username/password (RSIAM=0):
Verify
RSUSERandRSPASSWORDare correctEnsure password is marked as a secret in Valohai
Query returns no results
Redshift queries run successfully even if they return zero rows. Check your WHERE clauses and table names.
Next steps
Other database connectors:
Build your own:
Last updated
Was this helpful?
