Install Spark on Ubuntu (2): Standalone Cluster Mode

In the previous post, I set up Spark in local mode for testing purpose. In this post, I will set up Spark in the standalone cluster mode.

1. Prepare VMs

Create 3 identical VMs by following the previous local mode setup (Or create 2 more if one is already created). The spark directory needs to be on the same location (/usr/local/spark/ in this post) across all nodes.

2. Set up Password-less SSH

Because the master node uses SSH to start the slave nodes, it is convenient to use password-less SSH. Create a key pair in master node and copy the public key to each slave node by doing:

ssh-keygen -t rsa
ssh-copy-id [email protected]
ssh-copy-id [email protected] and are the IP addresses of my two newly created VMs. By default, the IPs are public and no firewall is set up.

3. Config Master Node

Give the two slave nodes hostnames by append the following lines to /etc/hosts. slave1 slave2

Define slave nodes in /usr/local/spark/conf/slaves. There is a slaves.template file under /usr/local/spark/conf. You can start with that file by renaming it to "slaves":

$ cd /usr/local/spark/conf
$ cp slaves.template slaves

Add the list of worker hostnames, each on a separate line.


The slaves file looks like the following:

When a salve node connects to master, it requires the address of the master node along with the port number. By default, the host name is used. Slave nodes can not use the hostname to connect the master node. The hostname needs to be changed to IP address so that slave nodes can connect to.

This can be configured in $SPARK_HOME/conf/ file. There is a file. You can rename that file to "".

Edit the file and add the SPARK_MASTER_HOST variable by adding the following line at the end of the file. Replace the IP address below with your IP address. View your IP address using the hostname -I command.


4. Start Spark Cluster

Start Spark Master.

$ $SPARK_HOME/sbin/

Once master is started, spark starts a master web UI. It's default URL is: http://youipaddress:8080.

Now start slaves.

$ $SPARK_HOME/sbin/

As shown in the snapshot, the web UI shows the master and workers. I have 2 worker nodes as expected.

5. Run Spark Shell in the standalone cluster mode

The standalone cluster is now ready to accept jobs. We can test it using spark-shell.

$ spark-shell --master spark://

The --master option specifies the master URL for a distributed cluster. The spark web ui ( shows the environment details as the following:

6. Add more slave nodes

If more worker nodes are needed, a new node can be added to the cluster by running:

$ $SPARK_HOME/sbin/ spark://

The script starts a slave node using the given master URI. This script needs to be run on the slave node (not on the master node, unless you want a master to be also a slave node) that is to be added to the cluster.

I run the command in the master node, so the master also becomes a slave node. After running, Spark Master UI ( looks like this:

There are 3 worker nodes now in the cluster.


Category >> spark  
If you want someone to read your code, please put the code inside <pre><code> and </code></pre> tags. For example:
String foo = "bar";
  • Qosai odah

    Thank you for every time you release an article.
    Your website is very helpful and valuable.
    you always give us information for free.
    Finally, this course help all of you a lot.

  • Sahar Kamal

    where etc/host found

  • trn11

    Great tutorial,but how do we setup the same on AWS using EC2 and not using EMR.
    I am having issues in doing the setup on EC2 ,the AWS EC2 does not allow copy ssh keys from one instance to another instance.

  • Very informative blog you share and I will save your blog in future reference.
    car rental tehran airport