Typecast filter plugin for Embulk

Build Status

A filter plugin for Embulk to cast column type.

Configuration

  • columns: columns to retain (array of hash)
    • name: name of column (required)
    • type: embulk type to cast
    • format: specify the format of the timestamp (string, default is default_timestamp_format)
    • timezone: specify the timezone of the timestamp (string, default is default_timezone)
  • default_timestamp_format: default timestamp format (string, default is %Y-%m-%d %H:%M:%S.%N %z)
  • default_timezone: default timezone (string, default is UTC)
  • stop_on_invalid_record: stop bulk load transaction if a invalid record is found (boolean, default is `false)

Example

See example.csv and example.yml.

JSONPath

For type: json column, you can specify JSONPath for column's name as:

name: $.payload.key1
name: "$.payload.array[0]"
name: "$.payload.array[*]"
name: $['payload']['key1.key2']

Following operators of JSONPath are not supported:

  • Multiple properties such as ['name','name']
  • Multiple array indexes such as [1,2]
  • Array slice such as [1:2]
  • Filter expression such as [?(<expression>)]

ToDo

  • Write test

Development

Run example:

$ ./gradlew classpath
$ embulk preview -I lib example/example.yml

Run test:

$ ./gradlew test

Run checkstyle:

$ ./gradlew check

Release gem:

$ ./gradlew gemPush