Before you begin, make sure you have an existing RedShift cluster with an inbound rule allowing connections from the Valohai workers’ security group.
You have two options for authenticating with RedShift:
- Using EC2 machine identity (Instance Profile)
- Using your authentication credentials
Using Machine Identity for Authentication
Your AWS account has an IAM Role for EC2 instances used by Valohai. You can grant this role access to Redshift. You can create multiple roles for specific Valohai environments, restricting access to certain teams in your Valohai organization.
Example IAM Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "redshift:GetClusterCredentials",
"Resource": [
"arn:aws:redshift:<region>:<account-ID>:dbuser:<cluster-identifier>/<username>",
"arn:aws:redshift:*:<account-ID>:dbgroup:*/*",
"arn:aws:redshift:<region>:<account-ID>:dbname:<cluster-identifier>/<database>"
]
}
]
}
Example Python script
import redshift_connector
import requests
import json
# Fetch credentials from the machines ValohaiWorkerRole
aws_metadata_ip = '169.254.169.254'
response = requests.get(f'http://{aws_metadata_ip}/latest/meta-data/iam/security-credentials/ValohaiWorkerRole')
# Parse the JSON results
credentials = json.loads(response.text)
# Fill in your details to these variables
host = '<cluster-identifier>.xxxxxxxxx.xx-xxxx-x.redshift.amazonaws.com'
database = 'XXX'
db_user = 'XXX'
cluster_identifier = '<cluster-identifier>'
# Connect to Redshift cluster using AWS credentials
conn = redshift_connector.connect(
iam=True,
host=host,
database=database,
db_user=db_user,
cluster_identifier=cluster_identifier,
access_key_id=credentials["AccessKeyId"],
secret_access_key=credentials["SecretAccessKey"],
session_token=credentials["Token"],
region='<region>'
)
redshift_connector
Make sure you include redshift_connector in your Docker image or install it inside your step.command
pip install redshift_connector
Authenticate with Custom Credentials
- Open your Valohai project settings.
- Go to the Env Variables tab.
- Create the necessary environment variables.
These variables will be accessible in all executions within the project.
Here are two examples demonstrating how to use these environment variables during a Valohai execution.
Environment Variable Groups
An organisation admin can also define an Environment Variable Group under organisation settings. You can then inherit those values to your executions, instead of writing the connection details for each project.
Name | Value | Secret |
---|---|---|
dbname | The name of your database | No |
redshift_host | e.g. redshift-valohai-sample.xxxxxxxxx.xx-xxxx-x.redshift.amazonaws.com | No |
port | Database port, for example 5439. | No |
user | Database username | No |
PGPASSWORD | The password of the user | Yes |
Examples
Python
import psycopg2
con= psycopg2.connect(
dbname= os.getenv('dbname'),
host = os.getenv('redshift_host'),
port = os.getenv('port'),
user = os.getenv('user'),
password = os.getenv('PGPASSWORD')
)
redshpsycopg2ift_connector
Make sure you include psycopg2 in your Docker image or install it inside your step.command
pip install psycopg2
psql
You can run the command-line client psql
directly in the step.command.
The code below will execute the query from query.sql (which is a part of the repository) and then output the results as a csv file to Valohai outputs.
- step:
name: Output from Redshift
image: myorg/redshift-image:latest
command:
- psql -h $redshift_host -d $dbname -U $user -p $port -A -f query.sql -F ',' -o /valohai/outputs/redshift_output.csv
psql
Make sure you include psql in your Docker image or install it inside your step.command
apt-get install psql -y