MongoDB-Hadoop Workshop Exercises

MongoDB powers applications as an operational database and Hadoop delivers intelligence as with powerful analytical infrastructure. In this workshop we'll start by learning about how these technologies fit together with the MongoDB Connector for Hadoop. Then we'll cover reading/writing MongoDB data using MapReduce, Pig, Hive, and Spark. Finally, we'll discuss the broader data ecosystem and operational considerations.

Data

Prior to running any of the exercises, load the sample dataset into MongoDB.

Download MongoDB
Install MongoDB
Download the MovieLens 10M archive and unzip

Finally, load the dataset:

$ python dataset/movielens.py [/path/to/movies.dat] [/path/to/ratings.dat]

For more information refer to the dataset README.

Exercises

Refer to the individual READMEs for steps on building and deploying each exercise.

MapReduce
Pig
Hive
Spark