spark-ranking-metrics

// hosted on Maven Central, for Scala 2.11 only at the moment
libraryDependencies += "com.github.jongwook" %% "spark-ranking-metrics" % "0.0.1"

Ranking is an important component for building recommender systems, and there are various methods to evaluate the offline performance of ranking algorithms [1].

Users are usually provided with a certain number of recommended items, so it is usual to measure the performance of only the top-k recommended items, and this is why the metrics are often suffixed by "@k".

Currently, Spark's implementations for ranking metrics are quite limited, and its NDCG implementation assumes that the relevance is always binary, producing different numbers. Meanwhile, RiVal aims to be a toolkit for reproducible recommender system evaluation and provides a robust implementation for a few ranking metrics, but it is written as a single-threaded application and therefore is not applicable to the scale which Spark users typically encounter.

To complement this, SparkRankingMetrics contains scalable implementations for NDCG, MAP, Precision, and Recall, using Spark's DataFrame/Dataset idioms.

[1] Gunawardana, A., & Shani, G. (2015). Evaluating recommendation systems. In Recommender systems handbook (pp. 265-308). Springer US.

Usage

import org.apache.spark.mllib.recommendation.MatrixFactorizationModel
import org.apache.spark.mllib.recommendation.Rating
import org.apache.spark.rdd.RDD
import com.github.jongwook.SparkRankingMetrics

// A recommendation model obtained by Spark's ALS
val model = MatrixFactorizationModel.load(sc, "path/to/model")
val result: RDD[(Int, Array[Rating])] = model.recommendProductsForUsers(20)

// A flattened RDD that contains all Ratings in the result
val prediction: RDD[Rating] = result.values.flatMap(ratings => ratings)
val groundTruth: RDD[Rating] = /* the ground-truth dataset */

// create DataFrames using the RDDs above
val predictionDF = spark.createDataFrame(prediction)
val groundTruthDF = spark.createDataFrame(groundTruth)

// instantiate using either DataFrames or Datasets
val metrics = SparkRankingMetrics(predictionDF, groundTruthDF)

// override the non-default column names
metrics.setItemCol("product")
metrics.setPredictionCol("rating")

// print the metrics
println(metrics.ndcgAt(10))
println(metrics.mapAt(15))
println(metrics.precisionAt(5))
println(metrics.recallAt(20))

Validation

This repository contains a test case that checks the numbers produced by SparkRankingMetrics are identical to what RiVal's corresponding implementations produce, using the MovieLens 100K dataset. The result is: