AWS Redshift Connector

Run SQL queries on AWS Redshift and save results to your data store.

Why use this connector?

Query directly from Valohai: No need to export data manually. Write SQL, run execution, get CSV output.

Version your queries: Every query is saved with the execution. Reproduce results months later by checking which query ran when.

Feed downstream jobs: Query outputs get datum URLs. Use them as inputs in other executions or pipelines.

Requirements

Redshift cluster on your AWS account
Cluster security group allows connections from valohai-sg-workers
Authentication via IAM role or username/password

Authentication options

Option 1: IAM role (recommended)

If your Valohai workers run on AWS with IAM roles:

Attach the policy below to ValohaiWorkerRole (or your worker role):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "GetRedshiftCredentials",
            "Effect": "Allow",
            "Action": "redshift:GetClusterCredentials",
            "Resource": "*"
        }
    ]
}

Set environment variables:
- RSCLUSTERIDENTIFIER: Redshift cluster identifier
- RSDATABASE: Database name
- RSHOST: Cluster endpoint (e.g., my-cluster.abc123.us-east-1.redshift.amazonaws.com)
- RSREGION: AWS region (e.g., us-east-1)
- RSIAM: Set to 1
- RSPORT: (Optional) Default is 5439

Option 2: Username and password

If not using IAM roles:

Set environment variables:
- RSCLUSTERIDENTIFIER: Redshift cluster identifier
- RSDATABASE: Database name
- RSHOST: Cluster endpoint
- RSREGION: AWS region
- RSIAM: Set to 0
- RSUSER: Redshift username
- RSPASSWORD: Redshift password (mark as secret)
- RSPORT: (Optional) Default is 5439

Add environment variables

Environment variables can be added:

Project-wide: Project Settings → Environment Variables
Organization-wide: Admin users can create environment variable groups that can be passed to several projects.
Per-execution: Set when creating the execution

We recommend project or organization settings for credentials.

Run a query

Open your project
Click Create Execution
Expand valohai-ecosystem → Select redshift-query
Configure parameters:
- query: Your SQL query
- output-path: (Optional) Output filename, default is results.csv
- datum-alias: (Optional) Alias for easy reference, e.g., latest-orders
Verify environment variables are set
Click Create Execution

Example query

SELECT 
    customer_id,
    product_category,
    SUM(order_total) as total_spent,
    COUNT(*) as order_count
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY customer_id, product_category
ORDER BY total_spent DESC
LIMIT 1000

Results are saved as results.csv (or your custom output path) and uploaded to your data store.

Use query results

The output of the execution gets a datum URL. Reference it in other executions by the URL directly or by using the datum alias shown in the example below:

- step:
    name: train-model
    image: python:3.11
    command:
      - python train.py
    inputs:
      - name: training-data
        default: datum://latest-orders

Or use it in a pipeline by passing the execution output to the next node.

Troubleshooting

Connection refused

Check:

Redshift cluster security group allows connections from valohai-sg-workers
RSHOST includes the full cluster endpoint (not just the identifier)
RSPORT is correct (default: 5439)

Authentication fails

If using IAM (RSIAM=1):

Verify ValohaiWorkerRole has redshift:GetClusterCredentials permission
Check that worker role is properly attached to your workers

If using username/password (RSIAM=0):

Verify RSUSER and RSPASSWORD are correct
Ensure password is marked as a secret in Valohai

Query returns no results

Redshift queries run successfully even if they return zero rows. Check your WHERE clauses and table names.

Next steps

Other database connectors:

Build your own:

Create custom library steps

PreviousBigQuery Connector NextSnowflake Connector

Last updated 22 hours ago

Was this helpful?