Orc output plugin for Embulk
Overview
- Plugin type: output
- Load all or nothing: no
- Resume supported: no
- Cleanup supported: yes
Configuration
- path_prefix: A prefix of output path. (string, required)
- support:
file,s3nands3a.
- support:
- file_ext: An extension of output file. (string, default:
.orc) - sequence_format: (string, default:
.%03d) - buffer_size: Set the ORC buffer size (integer, default:
262144) - strip_size: Set the ORC strip size (integer, default:
67108864) - block_size: Set the ORC block size (integer, default:
268435456) - compression_kind: description (string, default:
'ZLIB')NONE,ZLIB,SNAPPY
- overwrite: (LocalFileSystem only) Overwrite if output files already exist. (boolean, default:
false) default_from_timezone Time zone of timestamp columns. This can be overwritten for each column using column_options (DateTimeZone, default:
UTC)auth_method: name of mechanism to authenticate requests (basic, env, instance, profile, properties, anonymous, or session. default: basic)
see: https://github.com/embulk/embulk-input-s3#configurationenv,basic,profile,default,session,anonymous,properties
Example
out:
type: orc
path_prefix: "/tmp/output"
compression_kind: ZLIB
overwrite: true
ChangeLog
ver 0.3.0
Change default value : (block_size, buffer_size, strip_size)
- default value is Hive's default value.
(see: https://orc.apache.org/docs/hive-config.html)
- default value is Hive's default value.
ver 0.2.0
support: output to s3
s3n,s3aprotocol
ver 0.1.0
- initial release
Build
$ ./gradlew gem # -t to watch change of files and rebuild continuously