spark-cassandra-collabfiltering

This code goes with my Datanami article.

It illustrates MLLib on Spark using an example based on collaborative filtering of employee ratings for companies.

It shows the exact same Spark client functionality written in Java 7 and Java 8. The new new Java 8 features that make Spark's functional style much easier

I use Cassandra providing the data to Spark, and there's a synthesized training/validation set with accompanying spreadsheet to let you tweak parameters.

Here's how to get it working:

To setup (tested on Ubuntu 14.04):

Get Eclipse:

Project

Dataset

Cassandra

Running tests:

More references: