AWS Redshift Connector

Run SQL queries on AWS Redshift and save results to your data store.

Why use this connector?

Query directly from Valohai: No need to export data manually. Write SQL, run execution, get CSV output.

Version your queries: Every query is saved with the execution. Reproduce results months later by checking which query ran when.

Feed downstream jobs: Query outputs get datum URLs. Use them as inputs in other executions or pipelines.

Requirements

  • Redshift cluster on your AWS account

  • Cluster security group allows connections from valohai-sg-workers

  • Authentication via IAM role or username/password

Authentication options

If your Valohai workers run on AWS with IAM roles:

  1. Attach the policy below to ValohaiWorkerRole (or your worker role):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "GetRedshiftCredentials",
            "Effect": "Allow",
            "Action": "redshift:GetClusterCredentials",
            "Resource": "*"
        }
    ]
}
  1. Set environment variables:

    • RSCLUSTERIDENTIFIER: Redshift cluster identifier

    • RSDATABASE: Database name

    • RSHOST: Cluster endpoint (e.g., my-cluster.abc123.us-east-1.redshift.amazonaws.com)

    • RSREGION: AWS region (e.g., us-east-1)

    • RSIAM: Set to 1

    • RSPORT: (Optional) Default is 5439

Option 2: Username and password

If not using IAM roles:

  1. Set environment variables:

    • RSCLUSTERIDENTIFIER: Redshift cluster identifier

    • RSDATABASE: Database name

    • RSHOST: Cluster endpoint

    • RSREGION: AWS region

    • RSIAM: Set to 0

    • RSUSER: Redshift username

    • RSPASSWORD: Redshift password (mark as secret)

    • RSPORT: (Optional) Default is 5439

Add environment variables

Environment variables can be added:

  • Project-wide: Project Settings → Environment Variables

  • Organization-wide: Admin users can create environment variable groups that can be passed to several projects.

  • Per-execution: Set when creating the execution

We recommend project or organization settings for credentials.

Run a query

  1. Open your project

  2. Click Create Execution

  3. Expand valohai-ecosystem → Select redshift-query

  4. Configure parameters:

    • query: Your SQL query

    • output-path: (Optional) Output filename, default is results.csv

    • datum-alias: (Optional) Alias for easy reference, e.g., latest-orders

  5. Verify environment variables are set

  6. Click Create Execution

Example query

SELECT 
    customer_id,
    product_category,
    SUM(order_total) as total_spent,
    COUNT(*) as order_count
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY customer_id, product_category
ORDER BY total_spent DESC
LIMIT 1000

Results are saved as results.csv (or your custom output path) and uploaded to your data store.

Use query results

The output of the execution gets a datum URL. Reference it in other executions by the URL directly or by using the datum alias shown in the example below:

- step:
    name: train-model
    image: python:3.11
    command:
      - python train.py
    inputs:
      - name: training-data
        default: datum://latest-orders

Or use it in a pipeline by passing the execution output to the next node.

Troubleshooting

Connection refused

Check:

  • Redshift cluster security group allows connections from valohai-sg-workers

  • RSHOST includes the full cluster endpoint (not just the identifier)

  • RSPORT is correct (default: 5439)

Authentication fails

If using IAM (RSIAM=1):

  • Verify ValohaiWorkerRole has redshift:GetClusterCredentials permission

  • Check that worker role is properly attached to your workers

If using username/password (RSIAM=0):

  • Verify RSUSER and RSPASSWORD are correct

  • Ensure password is marked as a secret in Valohai

Query returns no results

Redshift queries run successfully even if they return zero rows. Check your WHERE clauses and table names.

Next steps

Other database connectors:

Build your own:

Last updated

Was this helpful?