Google Cloud Storage file input plugin for Embulk

Overview

  • Plugin type: file input
  • Resume supported: yes
  • Cleanup supported: yes

Usage

Install plugin

embulk gem install embulk-input-gcs

Google Service Account Settings

  1. Make project at Google Developers Console.

  2. Make "Service Account" with this step.

    Service Account has two specific scopes: read-only, read-write.

    embulk-input-gcs can run "read-only" scopes.

  3. Generate private key in P12(PKCS12) format, and upload to machine.

  4. Write "EMAIL_ADDRESS" and fullpath of PKCS12 private key in yaml.

run

embulk run /path/to/config.yml

Configuration

  • bucket Google Cloud Storage bucket name (string, required)
  • path_prefix prefix of target keys (string, required)
  • service_accound_email Google Cloud Storage service_account_email (string, required)
  • p12_keyfile_fullpath fullpath of p12 key (string, required)
  • application_name application name anything you like (string, optional)

Example

in:
  type: gcs
  bucket: my-gcs-bucket
  path_prefix: logs/csv-
  service_accound_email: ABCXYZ123ABCXYZ123.gserviceaccount.com
  p12_keyfile_path: /path/to/p12_keyfile.p12
  application_name: Anything you like

Example for "sample_01.csv.gz" , generated by embulk example

in:
  type: gcs
  bucket: my-gcs-bucket
  path_prefix: sample_
  service_accound_email: ABCXYZ123ABCXYZ123.gserviceaccount.com
  p12_keyfile_path: /path/to/p12_keyfile.p12
  application_name: Anything you like
  decoders:
  - {type: gzip}
  parser:
    charset: UTF-8
    newline: CRLF
    type: csv
    delimiter: ','
    quote: '"'
    header_line: true
    columns:
    - {name: id, type: long}
    - {name: account, type: long}
    - {name: time, type: timestamp, format: '%Y-%m-%d %H:%M:%S'}
    - {name: purchase, type: timestamp, format: '%Y%m%d'}
    - {name: comment, type: string}
out: {type: stdout}

Build

./gradlew gem