Using PageRank to find Anomalies in healthcare

This repository is a simple demo of using Personalized PageRank to identify provider features that may be useful for anomaly detection in healthcare payment data.



Identifying fraud, waste and abuse is a critical application for any insurance company, and a big focus for healthcare insureres around the world. Various approaches have been applied to address this problem, including rules-based and machine-learning-based solutions.

Graph algorithms have potential to improve the accuracy of such systems, and are thus of high interest. In this demo we show an approach to identifying anomalies in a real-world healthcare payment dataset, the Medicare-B data set, using a variant of the personalized PageRank algorithm.

We use Apache Pig and SociaLite (open-source graph analysis platform) to preprocess the data and analyze the data. The details of the algorithm is described in Hortonworks blog (TBD link)

Installation with Hortonworks Sandbox

To try this demo on the Hortonworks Sandbox, follow these steps:

Installation on a Hadoop cluster

If you have a Hadoop cluster you can use, follow these steps:

Running the demo

To run the demo code, follow these steps: