distributed-graph-analytics

Distributed Graph Analytics (DGA) is a compendium of graph analytics written for Bulk-Synchronous-Parallel (BSP) processing frameworks such as Giraph and GraphX.

Currently, DGA supports the following analytics:

Giraph
GraphX
dga-giraph

dga-giraph is the project that contains our Giraph implementation of DGA. For more information, go here: dga-giraph README.md

documentation

http://sotera.github.io/distributed-graph-analytics

Steps to run Louvain GraphX

Download The CentOS VM

https://github.com/Sotera/seam-team-6-vm

Required Scala Version

Scala version: 2.11.1 Scala installation location: /opt/scala

wget http://downloads.typesafe.com/scala/2.11.1/scala-2.11.1.tgz tar xvf scala-2.11.1.tgz sudo mv scala-2.11.1 /opt/scala

Required Spark Core and GraphX

Spark Core and GraphX version: 1.3.0 Spark installation location: /opt/spark

Start Spark with one master and four worker instances

cd /usr/lib/spark/sbin ./start-master.sh echo "export SPARK_WORKER_INSTANCES=4" >> spark-env.sh ./start-slaves.sh

Spark Web UI: Browse to the master web UI to make sure

the master and all the workers are started correctly

http://localhost:8080/

Copy the example data to HDFS:

wget http://sotera.github.io/distributed-graph-analytics/data/example.csv hdfs dfs -put example.csv hdfs://localhost:8020/tmp/dga/louvain/input/

Clone the repository and build the code

cd /vagrant git clone https://github.com/Sotera/distributed-graph-analytics cd distributed-graph-analytics gradle clean dist cd /vagrant/distributed-graph-analytics/dga-graphx/build/dist ./run.sh

You can find the output in:

hdfs://localhost:8020/tmp/dga/louvain/output