This repository contains a docker file to build a docker image to run Apache Hive on Tez. This docker file depends on my other repos containing docker-tez and docker-hadoop base images.
This step is required only for Mac OS X as docker is not natively supported in Mac OS X. To run docker on Mac OS X we need Boot2Docker. Boot2Docker installs headless virtual box, runs a lightweight linux distribution and sets up to run docker daemon.
boot2docker init
to initialize boot2docker.boot2docker start
to start boot2docker and export DOCKER_HOST
and DOCKER_CERT_PATH
as shown at the end of command.DOCKER_HOST
and DOCKER_CERT_PATH
we can run docker commands.NOTE: docker 1.3.0 versions require --tls to be passed to all docker command
You can either pull the image that is already pre-built from Docker hub or build the image locally (refer next section)
docker --tls pull prasanthj/docker-hive-on-tez
If you do not want to pull the image from Docker hub, you can build it locally using the following steps
git clone https://github.com/prasanthj/docker-hive-on-tez.git
cd docker-hive-on-tez
docker --tls build -t local-hive-on-tez .
NOTE: If the above step fails with the following exception
The PostgreSQL server failed to start. Please check the log output: 2014-12-10 00:26:07 UTC FATAL: could not access private key file "/etc/ssl/private/ssl-cert-snakeoil.key": Permission denied ...fail!
then build the image with --no-cache option to invalidate docker cache
docker --tls build --no-cache -t local-hive-on-tez .
docker --tls run -i -t -P local-hive-on-tez /etc/hive-bootstrap.sh -bash
After launching the container using the command from "Running the image" section, bash is launched. On the bash prompt type the following to run a sample hive query
hive -f /opt/files/store_sales.sql
Running the above command should show output like below after successful execution
Status: Running (Executing on YARN cluster with App id application_1415171696020_0001)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
Reducer 3 ...... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 1.65 s
--------------------------------------------------------------------------------
OK
2452143 30
2451524 25
2452274 25
2452187 20
2450952 16
2451942 16
2451083 15
2451390 15
2451415 15
2452181 15
Time taken: 2.566 seconds, Fetched: 10 row(s)
Run the same example above with the following additional hive config
hive -f /opt/files/store_sales.sql -hiveconf hive.execution.engine=mr -hiveconf mapreduce.framework.name=yarn -hiveconf yarn.resourcemanager.address=localhost:8032
Running the above command should show output like below after successful execution
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.17 sec HDFS Read: 36073 HDFS Write: 1830 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 33.47 sec HDFS Read: 2234 HDFS Write: 110 SUCCESS
Total MapReduce CPU Time Spent: 36 seconds 640 msec
OK
2452143 30
2451524 25
2452274 25
2452187 20
2450952 16
2451942 16
2451083 15
2451390 15
2451415 15
2452181 15
Time taken: 53.967 seconds, Fetched: 10 row(s)
If you are running docker using Boot2Docker then do the following steps
Setup routing on the host machine (Mac OS X) using the following
command sudo route add -net 172.17.0.0/16 192.168.59.103
NOTE: 172.17.0.X is usually the ipaddress of docker container. 192.168.59.103 is the ipaddress exported in DOCKER_HOST
Get containers IP address
docker --tls ps
docker --tls inspect -f=“{{.NetworkSettings.IPAddress}}” CONTAINER_ID
Launch a web browser and type http://<container-ip-address>:8088
to view hadoop cluster web UI.