kettle-beam

Kettle plugins for Apache Beam

First

build/install project kettle-beam-core

https://github.com/mattcasters/kettle-beam-core

Build

mvn clean install

Note you need the Pentaho settings.xml in your ~/.m2 : https://github.com/pentaho/maven-parent-poms/blob/master/maven-support-files/settings.xml

Install

Configure

File Definitions

Describe the file layout for the input and output of your pipeline using :

Spoon menu Beam / Create a file definition

Specify this file layout in your "Beam Input" and "Beam Output" steps. If you do not specify the file definition in the "Beam Output" step, all fields arriving at the step will be written with comma for separator and double quotes as enclosure. The formatting in the fields will be used.

Beam Job Configurations

A Beam Job configuration is needed to run your transformation on Apache Beam. Specify which Runner to use (Direct and Dataflow are supported).
You can use the variables to make your transformations completely generic. For example you can set an INPUT_LOCATION location variable

Supported

Runners

More information

http://diethardsteiner.github.io/pdi/2018/12/01/Kettle-Beam.html