Apache Spark examples on AWS & Valohai

Run Apache Spark applications on AWS EMR clusters using Valohai automation.


Overview

This example demonstrates:

  • Launching AWS EMR clusters from Valohai

  • Running Spark batch jobs remotely

  • Managing EMR configuration via Valohai parameters


Steps

1

Setup AWS IAM users

Create a new IAM role to access EMR and S3.

2

Import and Run the examples on Valohai

Start with running the run-debug-with-minimal-configuration example step.

3

Running your own Spark applications

The valohai.yaml in the project includes examples for a minimal configuration as well as maximal example. The maximal example should cover most of the configuration options.


GitHub Repository

The repository walks you through the steps above:

Last updated

Was this helpful?