AWS Redshift Connector

Run SQL queries on AWS Redshift and save results to your data store.

Why use this connector?

Query directly from Valohai: No need to export data manually. Write SQL, run execution, get CSV output.

Version your queries: Every query is saved with the execution. Reproduce results months later by checking which query ran when.

Feed downstream jobs: Query outputs get datum URLs. Use them as inputs in other executions or pipelines.

Requirements

  • Redshift cluster on your AWS account

  • Cluster security group allows connections from valohai-sg-workers

  • Authentication via IAM role or username/password

Authentication options

If your Valohai workers run on AWS with IAM roles:

  1. Attach the policy below to ValohaiWorkerRole (or your worker role):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "GetRedshiftCredentials",
      "Effect": "Allow",
      "Action": "redshift:GetClusterCredentials",
      "Resource": "*"
    }
  ]
}
  1. Set environment variables:

    • RSCLUSTERIDENTIFIER: Redshift cluster identifier

    • RSDATABASE: Database name

    • RSHOST: Cluster endpoint (e.g., my-cluster.abc123.us-east-1.redshift.amazonaws.com)

    • RSREGION: AWS region (e.g., us-east-1)

    • RSIAM: Set to 1

    • RSPORT: (Optional) Default is 5439

Option 2: Username and password

If not using IAM roles:

  1. Set environment variables:

    • RSCLUSTERIDENTIFIER: Redshift cluster identifier

    • RSDATABASE: Database name

    • RSHOST: Cluster endpoint

    • RSREGION: AWS region

    • RSIAM: Set to 0

    • RSUSER: Redshift username

    • RSPASSWORD: Redshift password (mark as secret)

    • RSPORT: (Optional) Default is 5439

Add environment variables

Environment variables can be added:

  • Project-wide: Project Settings → Environment Variables

  • Organization-wide: Admin users can create environment variable groups that can be passed to several projects.

  • Per-execution: Set when creating the execution

We recommend project or organization settings for credentials.

Run a query

  1. Open your project

  2. Click Create Execution

  3. Expand valohai-ecosystem → Select redshift-query

  4. Configure parameters:

    • query: Your SQL query

    • output-path: (Optional) Output filename, default is results.csv

    • datum-alias: (Optional) Alias for easy reference, e.g., latest-orders

  5. Verify environment variables are set

  6. Click Create Execution

Example query

Results are saved as results.csv (or your custom output path) and uploaded to your data store.

Use query results

The output of the execution gets a datum URL. Reference it in other executions by the URL directly or by using the datum alias shown in the example below:

Or use it in a pipeline by passing the execution output to the next node.

Troubleshooting

Connection refused

Check:

  • Redshift cluster security group allows connections from valohai-sg-workers

  • RSHOST includes the full cluster endpoint (not just the identifier)

  • RSPORT is correct (default: 5439)

Authentication fails

If using IAM (RSIAM=1):

  • Verify ValohaiWorkerRole has redshift:GetClusterCredentials permission

  • Check that worker role is properly attached to your workers

If using username/password (RSIAM=0):

  • Verify RSUSER and RSPASSWORD are correct

  • Ensure password is marked as a secret in Valohai

Query returns no results

Redshift queries run successfully even if they return zero rows. Check your WHERE clauses and table names.

Next steps

Other database connectors:

Build your own:

Last updated

Was this helpful?