Product Recommender based on Apache Spark and Elasticsearch

This repository is a just Proof of Cncept (POC) of how to create a Product Recommender using the latest Big Data technologies such as Apache Spark and Elasticsearch.

It is very advisable to read my two articles that refers to this PoC where you can find some theory behind the recommenders and more technical detail:

Technical Requirements

In order to launch this Poc, you must have running:

How to compile it

Just run the following command:

mvn clean compile 

How to run it

This Poc are split in two main parts:

es.alvsanand.spark_recommender.RecommenderTrainerApp

This process is the responsible of:

This PoC use this Amazon Dataset.

How to launch trainer

Just run the following command:

mvn exec:java -Dexec.mainClass="es.alvsanand.spark_recommender.RecommenderTrainerApp" -Dexec.args=""
Recommendation System Trainer
Usage: RecommenderTrainerApp [options]

  --spark.cores <value>
        Number of cores in the Spark cluster
  --spark.option spark.property1=value1,spark.property2=value2,...
        Spark Config Option
  --mongo.uri <value>
        Mongo uri (mongodb://db1.example.net,db2.example.net:27002,db3.example.net:27003/database)
  --mongo.db <value>
        Mongo Database
  --es.httpHosts <value>
        ElasicSearch HTTP Hosts (http://elastic:9200)
  --es.transportHosts <value>
        ElasicSearch Transport Hosts (http://elastic:9300)
  --es.index <value>
        ElasicSearch index
  --maxRecommendations <value>
        Maximum number of recommendations
  --help
        prints this usage text

es.alvsanand.spark_recommender.RecommenderServerApp

It is a REST API server that returns product recommendations. This PoC is able to return the following types of recommendations:

How to launch the server

Just run the following command:

mvn exec:java -Dexec.mainClass="es.alvsanand.spark_recommender.RecommenderServerApp" -Dexec.args="--help"
Recommendation System Server
Usage: RecommenderServerApp [options]

  --server.port <value>
        HTTP server port
  --mongo.uri <value>
        Mongo uri (mongodb://db1.example.net,db2.example.net:27002,db3.example.net:27003/database)
  --mongo.db <value>
        Mongo Database
  --es.httpHosts <value>
        ElasicSearch HTTP Hosts (http://elastic:9200)
  --es.transportHosts <value>
        ElasicSearch Transport Hosts (http://elastic:9300)
  --es.index <value>
        ElasicSearch index
  --help
        prints this usage text