Steps

Import the project in IntelliJ IDEA

Build and run your Spark job on a Spark cluster

We use sbt-assembly to bundle the application in a fat JAR, ready to be submitted to a Spark cluster. The JAR must not include the Spark components (spark-core, spark-sql, hadoop-client, etc) and their dependencies.

To build the JAR:

TODO: try to remove the manual part of editing build.sbt.

To submit the JAR:

Treats

Starting a Spark Cluster on EC2

TODO: write this paragraph

TODO: By default, spark-ec2 runs with hadoop-client on 1.0.4.
  One can also run the cluster on 2.0.x with `--hadoop-major-version=2`,
  which is an alpha version. @see http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client
  spark-ec2 does not provide a way to use the stable 2.4
  It would be nice to find a way to run spark-ec2 with the hadoop-client 2.4.x.
  @see https://groups.google.com/d/msg/spark-users/pHaF01sPwBo/faHr-fEAFbYJ