Cloud Integration for Apache Spark

The cloud-integration repository provides modules to improve Apache Spark's integration with cloud infrastructures.

Module spark-cloud-integration

Classes and Tools to make Spark work better in-cloud

See Spark Cloud Integration

Module cloud-examples

This does the packaging/integration tests for Spark and cloud against AWS, Azure and openstack.

These are basic tests of the core functionality of I/O, streaming, and verify that the commmitters work in the presence of inconsistent object storage As well as running as unit tests, they have CLI entry points which can be used for scalable functional testing.

Module minimal-integration-test

This is a minimal JAR for integration tests

Usage

spark-submit --class com.cloudera.spark.cloud.integration.Generator \
--master yarn \
--num-executors 2 \
--driver-memory 512m \
--executor-memory 512m \
--executor-cores 1 \
minimal-integration-test-1.0-SNAPSHOT.jar \
adl://example.azuredatalakestore.net/output/dest/1 \
2 2 15