In this guide, we’ll link a private AWS S3 bucket to a Valohai project.
Requirements
- an AWS S3 Bucket, with “Block all public access” enabled.
- a Valohai organization or project to link the S3 bucket to
Bucket CORS Settings
If you wish to be able to upload files to the store using the app.valohai.com web UI, you will need to add a CORS policy document to the S3 bucket.
- First, you navigate to the AWS S3 bucket you created.
- Then you go to the Permissions tab and scroll down to Cross-origin resource sharing (CORS).
- Click Edit to add the rules below:
[
{
"AllowedHeaders": [
"Authorization"
],
"AllowedMethods": [
"GET"
],
"AllowedOrigins": [
"*"
],
"ExposeHeaders": [],
"MaxAgeSeconds": 3000
},
{
"AllowedHeaders": [
"Authorization"
],
"AllowedMethods": [
"POST"
],
"AllowedOrigins": [
"https://app.valohai.com"
],
"ExposeHeaders": [],
"MaxAgeSeconds": 3000
}
]
Now your bucket allows POSTs for your user on https://app.valohai.com website.
Create an IAM user
Using the AWS console, start creating a new IAM user with programmatic access credentials (access key ID / secret access key).
- Username can be anything, try to be descriptive.
- Skip the permission configuration. We will add permissions later; you can skip to the next step.
- After creating the user, navigate to the security tab and click on “Create Access Key”.
Save your keys
Download the CSV or copy-paste the Access key ID and Secret access key somewhere safe.
Allow the IAM user to access the bucket
Now we have a user without any permissions; let’s allow the user to access our new bucket.
- Find and open the user you created in the previous section.
- Add a new inline policy. You can use any other AWS IAM policy definition methods just as well. Inline policies are the easiest get started.
The user needs to have full access to the S3 bucket; an example of a suitable access policy document is below. Make sure to change the resource name my-valohai-bucket
!
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::my-valohai-bucket",
"arn:aws:s3:::my-valohai-bucket/*"
]
}
]
}
Give your policy a descriptive name, and we are done with the mandatory AWS setup!
Large file upload
If executions need to upload outputs larger than 5 GB, an additional setup is needed. This is optional and only required for large outputs.
To upload large outputs using Amazon’s multipart upload API, a temporary AWS IAM role will be dispensed to the worker machines when required.
Be sure to replace the following placeholders in the following policy examples!
-
my-valohai-bucket
– the target S3 bucket -
ARN
- The ARN of the IAM user you created above
You can find the username and account number by going to IAM -> Users and selecting the user you just created.
- Copy the ARN of the IAM user you created above
- Select the Roles tab and create a new AWS IAM Role.
- Select Custom trust policy.
The Custom trust policy document should look like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "ARN"
},
"Action": "sts:AssumeRole"
}
]
}
Replace the ARN
with your own User ARN that you copied previously.
Click next.
Create policy for the role
A new tab will open. Select JSON.
The policy JSON should look like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "MultipartAccess",
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:ListBucketVersions",
"s3:ListMultipartUploadParts",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::my-valohai-bucket",
"arn:aws:s3:::my-valohai-bucket/*"
]
}
]
}
Make sure to change the resource name my-valohai-bucket
to your own bucket name.
- Click Next: Tags
- Next: Review.
- Name your policy
ValohaiMultipartRole
and click Create policy. - Go back to the tab that has the Role creating open and refresh the page.
- Select the policy that you just created by clicking the checkbox in front of it.
- Click next from the bottom of the page.
- Give your role a descriptive name and click Create role from the bottom of the page.
- Take note of the role’s AWS ARN (
arn:aws:...
), that will be configured to your Valohai project.
Link the store to Valohai
You can connect this data store either to a single project or create it on the organization level. The recommended way is to create it under the organization. All the Data Stores added under the organization will be available for all projects.
Link to a Valohai organization
- Login at https://app.valohai.com
- Navigate to
Hi, <name>
(the top-right menu) >Manage <organization>
. - Open the Data Stores tab.
- Click on Amazon S3 to add a new S3 Data Store
The data store can be shared with everyone in the organization, or you can expose the data store only to certain team(s).
- Name: The name of the Store in Valohai. This can be same as the Bucket name.
- Bucket Name: Name of the S3 bucket
- IAM Access Key: and IAM Secret Access Key: are the credentials of the IAM user you created
-
Region: The AWS Region you used (e.g.
us-east-1
) -
Multipart Upload IAM ROle ARN: Add the ARN of your
ValohaiMultipartRole
When you create the store, the credentials provided will be checked by creating a small test file in the bucket. The test file is automatically removed after the connection is verified.
Organization or Project Level
This data store is now available to every project in the organization. You can also define project-level data stores by configuring the Data Store under each Project settings.
ListObjects error
One common error that can be seen is the ListObjects error, which happens when the credentials can’t be used to list objects from the S3. Check your user’s policy document that is correct (and has two items, s3://name and s3://name/*).