Install Hadoop and HBase on Ubuntu

HBase is an open-source distributed non-relational database written in Java. It has become one of the dominant databases in big data. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS.

This post demonstrates how to set up Hadoop and HBase to run on a single machine. Running HDFS, YARN and HBase on a single machine is a great way for learning these systems. I will install on a brand new virtual machine, so that every step is covered and you can easily duplicate the installation process. There are a number of different ways to build a Hadoop cluster, such as using Apache tarballs, using linux packages (e.g. RPM and Debian) and using Hadoop cluster management tools (e.g. Apache Ambari). This post shows how to install using the Apache tarballs. This is the best way to show the details and meaning of each steps.

1. Prepare a Virtual Machine Environment

To have a brand new linux version, I will create a VM on The linux version used is Ubuntu 16.04.4 X64 and the VM has 4G memory.

It is recommended to create a user for Hadoop to isolate the Hadoop file system from the Unix file system. So create a user `hduser`.

$ addgroup hadoop
$ adduser --ingroup hadoop hduser
$ usermod -aG sudo hduser

Hadoop requires SSH access to manage its nodes. Therefore, we need to configure SSH access to localhost for the hduser user we created.

$ su hduser
$ ssh-keygen -t rsa -P ""

The second line will create an RSA key pair with an empty password. We don't want to enter the pass phrase every time Hadoop interacts with its nodes.

The following commands enable SSH access to the local machine with this newly created key.

$ cat ~/.ssh/ >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

You can test the password-less login with:

$ ssh localhost
$ exit

2. Install Java

Install Oracle's JDK version "1.8.0_181".

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

Verify the installation of Java.

$ java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

3. Install Hadoop

Available Hadoop releases are available at

$ sudo wget
$ sudo tar xzvf hadoop-3.0.3.tar.gz
$ sudo mv hadoop-3.0.3 hadoop
$ sudo chown -R hduser:hadoop hadoop

Edit /home/hduser/.profile to include the required environment variables. You may find the Java installation directory using this command:

$ ls -al /etc/alternatives/java
/etc/alternatives/java -> /usr/lib/jvm/java-8-oracle/jre/bin/java

Change the .profile file to the following:

# $HOME/.profile
# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
# Add Hadoop bin/ directory to PATH
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
# Add Java bin/ directory to PATH
export PATH=$PATH:$JAVA_HOME/bin
# Convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"

Edit (under /usr/local/hadoop/etc/hadoop/

The only required environment variable we have to configure for Hadoop in this tutorial is JAVA_HOME. Open in the editor of your choice and add the following line to set JAVA_HOME environment variable to the Oracle JDK 8 directory.

export JAVA_HOME=/usr/lib/jvm/java-8-oracle

Test the download with the example mapreduce job. In this example, we count the number of occurrence of "file[.]*".

$ mkdir ~/input
$ cp /usr/local/hadoop/etc/hadoop/*.xml ~/input
$ /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar grep ~/input ~/count_example 'file[.]*'
$ cat ~/count_example/*
12	file
10	file.

Set HADOOP_HOME and PATH by adding the following lines to the .profile file and resource it.

export HADOOP_HOME=/usr/local/hadoop
$ . ~/.profile

Edit core-site.xml (under /usr/local/hadoop/etc/hadoop).

The core-site.xml file contains information of common settings, such as the port number used for Hadoop instance, memory allocated for file system, memory limit for storing data, and the size of Read/Write buffers.


Edit hdfs-site.xml (under /usr/local/hadoop/etc/hadoop).

The hdfs-site.xml file contains information such as the value of name node path, secondary name node, and data node path of your local file systems, where you want to store the Hadoop infrastructure.


Edit yarn-site.xml (under /usr/local/hadoop/etc/hadoop).

The yarn-site.xml file configures settings for YARN daemons: the resource manager, the web app proxy server, and the node managers.


Edit mapred-site.xml (under /usr/local/hadoop/etc/hadoop).

This file is used to specify which MapReduce framework we are using. By default, Hadoop contains a template of mapred-site.xml.


Format the HDFS filesystem.

The first step to starting up Hadoop is formatting the Hadoop file system which is implemented on top of the local file system of the cluster (only the local machine in this tutorial). This should be done the first time when you set up a Hadoop cluster. Do not format a running Hadoop file system as you will lose all the data currently in the cluster! Note that data nodes are not involved in the formatting process, because name nodes manage metadata of all filesystems and data node can join and leave the cluster on the fly. Formatting the file system initializes the directory specified by the variable in hdfs-site.xml:

$ hdfs namenode -format

Last line of the output is:

SHUTDOWN_MSG: Shutting down NameNode at ubuntu-s-2vcpu-4gb-sfo2-01/

Start dfs.

Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [ubuntu-s-2vcpu-4gb-sfo2-01]
ubuntu-s-2vcpu-4gb-sfo2-01: Warning: Permanently added 'ubuntu-s-2vcpu-4gb-sfo2-01' (ECDSA) to the list of known hosts.

Start yarn.

Starting resourcemanager
Starting nodemanagers

Check DFS Health using portal at http://your_ip_address:9870/dfshealth.html.

Check Hadoop Cluster Overview at http://your_ip_address:8088/cluster/cluster.

Once you have a Hadoop cluster set up and running, you can access the directories. For example, access the root directory:

$ hadoop fs -la /

Create a root directory "test":

$hadoop fs -mkdir /test

4. Install HBase

The available release of HBase is located at: Version is used in this tutorial.

$ sudo wget
$ sudo tar xzvf hbase-
$ sudo mv hbase- hbase
$ sudo chown -R hduser:hadoop hbase

Add the following line to ~/.profile and re-source it.

export HBASE_HOME=/usr/local/hbase

Edit JAVA_HOME in shell script under folder /usr/local/Hbase/conf/.

export JAVA_HOME=/usr/lib/jvm/java-8-oracle

Edit hbase-site.xml under /usr/local/hbase/conf/hbase-site.xml.


Now start hbase, and check master status at: http://your_ip_address:16010/master-status.

starting master, logging to /usr/local/hbase/logs/hbase-hduser-master-ubuntu-s-2vcpu-4gb-sfo2-01.out

You can now play with HBase by using the HBase shell:

[email protected]:/usr/local/hbase/bin$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version, rUnknown, Sun Jun  3 23:19:26 CDT 2018

The following command shows how to create a table "test" with one column family "info" and add a row.

hbase(main):002:0> create 'test', 'info'
0 row(s) in 2.3660 seconds

=> Hbase::Table - test
hbase(main):003:0> put 'test', 'row1', 'info:name','programcreek'
0 row(s) in 0.3760 seconds

hbase(main):004:0> put 'test', 'row1', 'info:type','blog'
0 row(s) in 0.0210 seconds

hbase(main):005:0> get 'test', 'row1'
COLUMN                       CELL                                                                              
 info:name                   timestamp=1533271202442, value=programcreek                                       
 info:type                   timestamp=1533271213622, value=blog                                               
2 row(s) in 0.1030 seconds

hbase(main):007:0> scan 'test'
ROW                          COLUMN+CELL                                                                       
 row1                        column=info:name, timestamp=1533271202442, value=programcreek                     
 row1                        column=info:type, timestamp=1533271213622, value=blog                             
1 row(s) in 0.0540 seconds

If you want to delete the table, you must first disable it before dropping it:

hbase(main):001:0> disable 'test'
0 row(s) in 3.1120 seconds

hbase(main):002:0> drop 'test'
0 row(s) in 1.2890 seconds

hbase(main):003:0> list
7 row(s) in 0.0310 seconds


If you are interested, you may also check out some Java code examples for showing how to use HBase in Java.

Category >> big data  
If you want someone to read your code, please put the code inside <pre><code> and </code></pre> tags. For example:
String foo = "bar";