Docker image to run Apache Hive on Tez

This repository contains a docker file to build a docker image to run Apache Hive on Tez. This docker file depends on my other repos containing docker-tez and docker-hadoop base images.

Current Version

Running on Mac OS X

This step is required only for Mac OS X as docker is not natively supported in Mac OS X. To run docker on Mac OS X we need Boot2Docker. Boot2Docker installs headless virtual box, runs a lightweight linux distribution and sets up to run docker daemon.

Setting up docker

NOTE: docker 1.3.0 versions require --tls to be passed to all docker command

Pull the image

You can either pull the image that is already pre-built from Docker hub or build the image locally (refer next section)

docker --tls pull prasanthj/docker-hive-on-tez

Building the image

If you do not want to pull the image from Docker hub, you can build it locally using the following steps

docker --tls build -t local-hive-on-tez .

NOTE: If the above step fails with the following exception The PostgreSQL server failed to start. Please check the log output: 2014-12-10 00:26:07 UTC FATAL: could not access private key file "/etc/ssl/private/ssl-cert-snakeoil.key": Permission denied ...fail!

then build the image with --no-cache option to invalidate docker cache

docker --tls build --no-cache -t local-hive-on-tez .

Running the image

docker --tls run -i -t -P local-hive-on-tez /etc/hive-bootstrap.sh -bash

Testing Hive on Tez

After launching the container using the command from "Running the image" section, bash is launched. On the bash prompt type the following to run a sample hive query

hive -f /opt/files/store_sales.sql

Running the above command should show output like below after successful execution

Status: Running (Executing on YARN cluster with App id application_1415171696020_0001)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
Reducer 3 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 1.65 s     
--------------------------------------------------------------------------------
OK
2452143 30
2451524 25
2452274 25
2452187 20
2450952 16
2451942 16
2451083 15
2451390 15
2451415 15
2452181 15
Time taken: 2.566 seconds, Fetched: 10 row(s)

Testing Hive on MapReduce v2 (YARN)

Run the same example above with the following additional hive config

hive -f /opt/files/store_sales.sql -hiveconf hive.execution.engine=mr -hiveconf mapreduce.framework.name=yarn -hiveconf yarn.resourcemanager.address=localhost:8032

Running the above command should show output like below after successful execution

MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.17 sec   HDFS Read: 36073 HDFS Write: 1830 SUCCESS
Stage-Stage-2: Map: 1  Reduce: 1   Cumulative CPU: 33.47 sec   HDFS Read: 2234 HDFS Write: 110 SUCCESS
Total MapReduce CPU Time Spent: 36 seconds 640 msec
OK
2452143 30
2451524 25
2452274 25
2452187 20
2450952 16
2451942 16
2451083 15
2451390 15
2451415 15
2452181 15
Time taken: 53.967 seconds, Fetched: 10 row(s)

Viewing Web UI

If you are running docker using Boot2Docker then do the following steps