spark-etl

Build Status Coverage Status Join the chat at https://gitter.im/vngrs/spark-etl License

What is spark-etl?

The ETL(Extract-Transform-Load) process is a key component of many data management operations, including move data and to transform the data from one format to another. To effectively support these operations, spark-etl is providing a distributed solution.

spark-etl is a Scala-based project and it is developing with Spark. So it is scalable and distributed. spark-etl will process data from N source to N database. The project structure:

Extract

alt text

Transform

alt text

Load

alt text

Pros

Example Scenario

We want to get data from multiple sources like MySQL and CVS. When we extracting data, we also want to filter and merge some fields/tables. During the transform layer, we want to run an SQL. Then we want to write the transformed data to multiple targets like S3 and Redshift.

etl

spark-etl is the easiest way to do this scenario!

Tech

Installation

Prerequisites for building spark-etl:

How to become a committer

Want to contribute? Great! Let's say "Hello" on gitter.

Todos

License

MIT License