TD output plugin for Embulk
Treasure Data Service output plugin for Embulk
NOTICE:
- embulk-output-td v0.5.0+ requires Java 1.8 or higher. For Java7, use embulk-output-td v0.4.x
- embulk-output-td v0.4.0+ only supports Embulk v0.8.22+.
Overview
- Plugin type: output
- Load all or nothing: yes
- Resume supported: no
Configuration
- apikey: apikey (string, required)
- endpoint: hostname (string, default='api.treasuredata.com')
- http_proxy: http proxy configuration (tuple of host, port, useSsl, user, and password. default is null)
- use_ssl: the flag (boolean, default=true)
- auto_create_table: the flag for creating the database and/or the table if they don't exist (boolean, default=true)
- mode: 'append', 'replace' and 'truncate' (string, default='append')
- database: database name (string, required)
- table: table name (string, required)
- session: bulk_import session name (string, optional)
- pool_name: bulk_import session pool name (string, optional)
- time_column: user-defined time column (string, optional)
- unix_timestamp_unit: if type of "time" or time_column is long, it's considered unix timestamp. This option specify its unit in sec, milli, micro or nano (enum, default:
sec
) - tmpdir: temporal directory (string, optional) if set to null, plugin will use directory that could get from System.property
- upload_concurrency: upload concurrency (int, default=2). max concurrency is 8.
- file_split_size: split size (long, default=16384 (16MB)).
- stop_on_invalid_record: stop bulk load transaction if a file includes invalid record (such as invalid timestamp) (boolean, default=false).
- displayed_error_records_count_limit: limit the count of the shown error records skipped by the perform job (int, default=10).
- default_timestamp_type_convert_to: configure output type of timestamp columns. Available options are "sec" (convert timestamp to UNIX timestamp in seconds) and "string" (convert timestamp to string). (string, default:
"string"
) - default_timezone: default timezone (string, default='UTC')
- default_timestamp_format: default timestamp format (string, default=
%Y-%m-%d %H:%M:%S.%6N
) - column_options: advanced: a key-value pairs where key is a column name and value is options for the column.
- timezone: If input column type (embulk type) is timestamp, this plugin needs to format the timestamp value into a SQL string. In this cases, this timezone option is used to control the timezone. (string, value of default_timezone option is used by default)
- format: If input column type (embulk type) is timestamp, this plugin needs to format the timestamp value into a string. This timestamp_format option is used to control the format of the timestamp. (string, value of default_timestamp_format option is used by default)
- retry_limit: indicates how many retries are allowed (int, default: 20)
- retry_initial_interval_millis: the initial intervals (int, default: 1000)
- retry_max_interval_millis: the maximum intervals. The interval doubles every retry until retry_max_interval_millis is reached. (int, default: 90000)
- additional_http_headers: add additional headers to the requests (a key & value map, default: null)
- port: set port for Http requests. By default will connect to port 443 or 80 if
use_ssl: false
(int, optional)
Modes
- append:
- Uploads data to existing table directly.
- replace:
- Creates new temp table and uploads data to the temp table first.
- After uploading finished, the table specified as 'table' option is replaced with the temp table.
- Schema in existing table is not migrated to the replaced table.
- truncate:
- Creates new temp table and uploads data to the temp table first.
- After uploading finished, the table specified as 'table' option is replaced with the temp table.
- Schema in existing table is added to the replaced table.
Example
Here is sample configuration for TD output plugin.
out:
type: td
apikey: <your apikey>
endpoint: api.treasuredata.com
database: my_db
table: my_table
time_column: created_at
auto_create_table: true
mode: append
Install
$ embulk gem install embulk-output-td
Http Proxy Configuration
If you want to add your Http Proxy configuration, you can use http_proxy
parameter:
out:
type: td
apikey: <your apikey>
endpoint: api.treasuredata.com
http_proxy: {host: localhost, port: 8080, use_ssl: false, user: "proxyuser", password: "PASSWORD"}
database: my_db
table: my_table
time_column: created_at
auto_create_table: true
mode: append
Additional Http headers
out:
type: td
apikey: <your apikey>
endpoint: api.treasuredata.com
database: my_db
table: my_table
time_column: created_at
auto_create_table: true
mode: append
additional_http_headers:
Content_Type: 'application/json'
foo: bar
Build
Build by Gradle
$ git clone https://github.com/treasure-data/embulk-output-td.git
$ cd embulk-output-td
$ ./gradlew gem classpath
Run on Embulk
$ bin/embulk run -I embulk-output-td/lib/ config.yml
Release
Upload gem to Rubygems.org
$ ./gradlew gem # create .gem file under pkg/ directory
$ ./gradlew gemPush # create and publish .gem file
Repo URL: https://rubygems.org/gems/embulk-output-td
Upload jars to Bintray.com
$ ./gradlew bintrayUpload
Repo URL: https://bintray.com/embulk-output-td/maven/embulk-output-td