Apache Spark Development

Using Python and Scala

Compliant with Spark 2.0

Samples showing Scala and PySpark API usage in Scala and Python respectively.

Also contains a base class for testing PySpark code using SparkSession and PyUnit.

This base class is a slight changed version from the ReusedPySparkTestCase class in the pyspark.test module.
Subclass the PyUnit test case from the CustomPySparkTestCase class. The CustomPySparkTestCase class encapsulates the SparkSession which can be used as an entry point instead of SparkContext.

Example:

from pysparktest import CustomPySparkTestCase

class SampleTest(CustomPySparkTestCase):
    def test_word_cnt(self):
        rdd = self.spark.sparkContext.parallelize(['Hi there', 'Hi'])
        self.assertEqual(word_cnt(rdd).collectAsMap(), {'Hi' : 2, 'there' : 1})