Redshift output plugins for Embulk
Redshift output plugins for Embulk loads records to Redshift.
Overview
- Plugin type: output
- Load all or nothing: depnds on the mode. see bellow.
- Resume supported: depnds on the mode. see bellow.
Configuration
- host: database host name (string, required)
- port: database port number (integer, default: 5439)
- user: database login user name (string, required)
- password: database login password (string, default: "")
- database: destination database name (string, required)
- schema: destination schema name (string, default: "public")
- table: destination table name (string, required)
- access_key_id: access key id for AWS
- **secret_access_key: secret access key for AWS
- **iam_user_name: IAM user name for uploading temporary files to S3. The user should have permissions of
s3:GetObject
,s3:PutObject
,s3:ListBucket
andsts:GetFederationToken
. - **s3_bucket: S3 bucket name for temporary files
- **s3_key_prefix: S3 key prefix for temporary files (string, default:"")
- options: extra connection properties (hash, default: {})
- mode: "replace" or "insert" (string, required)
- batch_size: size of a single batch insert (integer, default: 16777216)
- default_timezone: If input column type (embulk type) is timestamp, this plugin needs to format the timestamp into a SQL string. This default_timezone option is used to control the timezone. You can overwrite timezone for each columns using column_options option. (string, default:
UTC
) - column_options: advanced: a key-value pairs where key is a column name and value is options for the column.
- type: type of a column when this plugin creates new tables (e.g.
VARCHAR(255)
,INTEGER NOT NULL UNIQUE
). This used when this plugin creates intermediate tables (insert, truncate_insert and merge modes), when it creates the target table (insert_direct and replace modes), and when it creates nonexistent target table automatically. (string, default: depends on input column type.BIGINT
if input column type is long,BOOLEAN
if boolean,DOUBLE PRECISION
if double,CLOB
if string,TIMESTAMP
if timestamp) - value_type: This plugin converts input column type (embulk type) into a database type to build a INSERT statement. This value_type option controls the type of the value in a INSERT statement. (string, default: depends on input column type. Available values options are:
byte
,short
,int
,long
,double
,float
,boolean
,string
,nstring
,date
,time
,timestamp
,decimal
,null
,pass
) - timestamp_format: If input column type (embulk type) is timestamp and value_type is
string
ornstring
, this plugin needs to format the timestamp value into a string. This timestamp_format option is used to control the format of the timestamp. (string, default:%Y-%m-%d %H:%M:%S.%6N
) - timezone: If input column type (embulk type) is timestamp, this plugin needs to format the timestamp value into a SQL string. In this cases, this timezone option is used to control the timezone. (string, value of default_timezone option is used by default)
- type: type of a column when this plugin creates new tables (e.g.
Modes
- insert:
- Behavior: This mode writes rows to some intermediate tables first. If all those tasks run correctly, runs
INSERT INTO <target_table> SELECT * FROM <intermediate_table_1> UNION ALL SELECT * FROM <intermediate_table_2> UNION ALL ...
query. - Transactional: Yes. This mode successfully writes all rows, or fails with writing zero rows.
- Resumable: Yes.
- Behavior: This mode writes rows to some intermediate tables first. If all those tasks run correctly, runs
- insert_direct:
- Behavior: This mode inserts rows to the target table directly.
- Transactional: No. If fails, the target table could have some rows inserted.
- Resumable: No.
- truncate_insert:
- Behavior: Same with
insert
mode excepting that it truncates the target table right before the lastINSERT ...
query. - Transactional: Yes.
- Resumable: Yes.
- Behavior: Same with
- merge:
- Behavior: This mode writes rows to some intermediate tables first. If all those tasks run correctly, runs
with updated AS (UPDATE .... RETURNING ...) INSERT INTO ....
query. - Transactional: Yes.
- Resumable: Yes.
- Behavior: This mode writes rows to some intermediate tables first. If all those tasks run correctly, runs
- replace:
- Behavior: Same with
insert
mode excepting that it truncates the target table right before the lastINSERT ...
query. - Transactional: Yes.
- Resumable: No.
- Behavior: Same with
Example
out:
type: redshift
host: myinstance.us-west-2.redshift.amazonaws.com
user: pg
password: ""
database: my_database
table: my_table
access_key_id: ABCXYZ123ABCXYZ123
secret_access_key: AbCxYz123aBcXyZ123
iam_user_name: my-s3-read-only
s3_bucket: my-redshift-transfer-bucket
s3_key_prefix: temp/redshift
mode: insert
Advanced configuration:
out:
type: redshift
host: myinstance.us-west-2.redshift.amazonaws.com
user: pg
password: ""
database: my_database
table: my_table
access_key_id: ABCXYZ123ABCXYZ123
secret_access_key: AbCxYz123aBcXyZ123
iam_user_name: my-s3-read-only
s3_bucket: my-redshift-transfer-bucket
s3_key_prefix: temp/redshift
options: {loglevel: 2}
mode: insert_direct
column_options:
my_col_1: {type: 'VARCHAR(255)'}
my_col_3: {type: 'INT NOT NULL'}
my_col_4: {value_type: string, timestamp_format: `%Y-%m-%d %H:%M:%S %z`, timezone: '-0700'}
my_col_5: {type: 'DECIMAL(18,9)', value_type: pass}
Build
$ ./gradlew gem