# AWS Redshift Connector

Run SQL queries on AWS Redshift and save results to your data store.

## Why use this connector?

**Query directly from Valohai:** No need to export data manually. Write SQL, run execution, get CSV output.

**Version your queries:** Every query is saved with the execution. Reproduce results months later by checking which query ran when.

**Feed downstream jobs:** Query outputs get datum URLs. Use them as inputs in other executions or pipelines.

## Requirements

* Redshift cluster on your AWS account
* Cluster security group allows connections from `valohai-sg-workers`
* Authentication via IAM role or username/password

## Authentication options

### Option 1: IAM role (recommended)

If your Valohai workers run on AWS with IAM roles:

1. Attach the policy below to `ValohaiWorkerRole` (or your worker role):

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "GetRedshiftCredentials",
      "Effect": "Allow",
      "Action": "redshift:GetClusterCredentials",
      "Resource": "*"
    }
  ]
}
```

2. Set environment variables:
   * `RSCLUSTERIDENTIFIER`: Redshift cluster identifier
   * `RSDATABASE`: Database name
   * `RSHOST`: Cluster endpoint (e.g., `my-cluster.abc123.us-east-1.redshift.amazonaws.com`)
   * `RSREGION`: AWS region (e.g., `us-east-1`)
   * `RSIAM`: Set to `1`
   * `RSPORT`: (Optional) Default is `5439`

### Option 2: Username and password

If not using IAM roles:

1. Set environment variables:
   * `RSCLUSTERIDENTIFIER`: Redshift cluster identifier
   * `RSDATABASE`: Database name
   * `RSHOST`: Cluster endpoint
   * `RSREGION`: AWS region
   * `RSIAM`: Set to `0`
   * `RSUSER`: Redshift username
   * `RSPASSWORD`: Redshift password (**mark as secret**)
   * `RSPORT`: (Optional) Default is `5439`

## Add environment variables

Environment variables can be added:

* **Project-wide**: Project Settings → Environment Variables
* **Organization-wide**: Admin users can create environment variable groups that can be passed to several projects.
* **Per-execution**: Set when creating the execution

We recommend project or organization settings for credentials.

## Run a query

1. Open your project
2. Click **Create Execution**
3. Expand **valohai-ecosystem** → Select `redshift-query`
4. Configure parameters:
   * **query**: Your SQL query
   * **output-path**: (Optional) Output filename, default is `results.csv`
   * **datum-alias**: (Optional) Alias for easy reference, e.g., `latest-orders`
5. Verify environment variables are set
6. Click **Create Execution**

### Example query

```sql
SELECT
    customer_id,
    product_category,
    SUM(order_total) as total_spent,
    COUNT(*) as order_count
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY customer_id, product_category
ORDER BY total_spent DESC
LIMIT 1000
```

Results are saved as `results.csv` (or your custom output path) and uploaded to your data store.

## Use query results

The output of the execution gets a datum URL. Reference it in other executions by the URL directly or by using the datum alias shown in the example below:

```yaml
- step:
    name: train-model
    image: python:3.11
    command:
      - python train.py
    inputs:
      - name: training-data
        default: datum://latest-orders
```

Or use it in a pipeline by passing the execution output to the next node.

## Troubleshooting

### Connection refused

**Check:**

* Redshift cluster security group allows connections from `valohai-sg-workers`
* `RSHOST` includes the full cluster endpoint (not just the identifier)
* `RSPORT` is correct (default: `5439`)

### Authentication fails

**If using IAM (`RSIAM=1`):**

* Verify `ValohaiWorkerRole` has `redshift:GetClusterCredentials` permission
* Check that worker role is properly attached to your workers

**If using username/password (`RSIAM=0`):**

* Verify `RSUSER` and `RSPASSWORD` are correct
* Ensure password is marked as a secret in Valohai

### Query returns no results

Redshift queries run successfully even if they return zero rows. Check your `WHERE` clauses and table names.

## Next steps

**Other database connectors:**

* [BigQuery](/reusable-step-libraries/ecosystem-libraries/bigquery.md)
* [Snowflake](/reusable-step-libraries/ecosystem-libraries/snowflake.md)

**Build your own:**

* [Create custom library steps](/reusable-step-libraries/build-your-own-library.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/reusable-step-libraries/ecosystem-libraries/redshift.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
