Column filter plugin for Embulk

Build Status

A filter plugin for Embulk to filter out columns

Configuration

Example - columns

Say input.csv is as follows:

time,id,key,score
2015-07-13,0,Vqjht6YE,1370
2015-07-13,1,VmjbjAA0,3962
2015-07-13,2,C40P5H1W,7323
filters:
  - type: column
    columns:
      - {name: time, default: "2015-07-13", format: "%Y-%m-%d"}
      - {name: id}
      - {name: key, default: "foo"}

reduces columns to only time, id, and key columns as:

time,id,key
2015-07-13,0,Vqjht6YE
2015-07-13,1,VmjbjAA0
2015-07-13,2,C40P5H1W

Note that column types are automatically retrieved from input data (inputSchema).

Example - add_columns

Say input.csv is as follows:

time,id,key,score
2015-07-13,0,Vqjht6YE,1370
2015-07-13,1,VmjbjAA0,3962
2015-07-13,2,C40P5H1W,7323
filters:
  - type: column
    add_columns:
      - {name: d, type: timestamp, default: "2015-07-13", format: "%Y-%m-%d"}
      - {name: copy_id, src: id}

add d column, and copy_id column which is a copy of id column as:

time,id,key,score,d,copy_id
2015-07-13,0,Vqjht6YE,1370,2015-07-13,0
2015-07-13,1,VmjbjAA0,3962,2015-07-13,1
2015-07-13,2,C40P5H1W,7323,2015-07,13,2

Example - drop_columns

Say input.csv is as follows:

time,id,key,score
2015-07-13,0,Vqjht6YE,1370
2015-07-13,1,VmjbjAA0,3962
2015-07-13,2,C40P5H1W,7323
filters:
  - type: column
    drop_columns:
      - {name: time}
      - {name: id}

drop time and id columns as:

key,score
Vqjht6YE,1370
VmjbjAA0,3962
C40P5H1W,7323

JSONPath

For type: json column, you can specify JSONPath for column's name as:

- {name: $.payload.key1}
- {name: "$.payload.array[0]"}
- {name: "$.payload.array[*]"}
- {name: $['payload']['key1.key2']}

EXAMPLE:

Following operators of JSONPath are not supported:

Note that type: timesatmp for add_columns or columns is not available because Embulk's type: json cannot have timestamp column inside.

Also note that renameing or copying of json paths by src option is only partially supported yet. The parent json path must be same like:

- {name: $.payload.foo.dest, src: $.payload.foo.src}

I mean that below example does not work yet ($.payload.foo and $.payload.bar)

- {name: $.payload.foo.dest, src: $.payload.bar.src}

Development

Run example:

$ ./gradlew classpath
$ embulk preview -I lib example/example.yml

Run test:

$ ./gradlew test

Run test with coverage reports:

$ ./gradlew test jacocoTestReport

open build/reports/jacoco/test/html/index.html

Run checkstyle and findbugs:

$ ./gradlew check

Run only checkstyle:

$ ./gradlew checkstyleMain
$ ./gradlew checkstyleTest

Run only findbugs:

$ ./gradlew findbugsMain
$ ./gradlew findbugsTest

Release gem:

$ ./gradlew gemPush