Mastering Apache Spark 2.x - Second Edition

This is the code repository for Mastering Apache Spark 2.x - Second Edition, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish.

About the Book

Apache Spark is an in-memory cluster based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and SQL. This book aims to take your limited knowledge of Spark to the next level by teaching you how to expand Spark functionality and implement your data flows and machine/deep learning programs on top of the platform.

Instructions and Navigation

All of the code is organized into folders. Each folder starts with a number followed by the application name. For example, Chapter02.

The code will look like the following:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

You will need the following to work with the examples in this book:

Related Products