Amazon S3 output plugin for Embulk


embulk-output-s3v2 is a plugin for Embulk, which based on aws-sdk-java-v2. Files stores on Amazon S3.

Overview

  • Plugin type: output
  • Load all or nothing: no
  • Resume supported: no
  • Cleanup supported: yes, but development in progress

Configuration

  • region: AWS region name. (string, required)
  • enable_profile: If true, AWS credentials profile will be used when authenticating AWS. If false, IamRole will be used. (boolean, default: false)
    • Supported in v0.2.0 or later
  • profile: AWS credentials profile name. If enable_profile is false, this parameter will be ignored. (string, default: default)
    • Supported in v0.2.0 or later
  • bucket: S3 bucket name. (string, required)
  • object_key_prefix: Prefix of S3 Objects key name. (string, required)
  • enable_multi_part_upload: If true, multipart upload will be enable. (boolean, default: false)
    • If enable_temp_file_output is false, this parameter must be false or are not specified.
  • max_concurrent_requests: Maximum concurrently requests to upload an object divided into multipart. If enable_multi_part_upload is false, this parameter will be ignored. (int, default: 10)
  • multipart_chunksize: Once the operation have decided to use multipart operation, the file will be divided into chunks specified this parameter. If enable_multi_part_upload is false, this parameter will be ignored. (string, default: 8MB)
    • Minimum size: 5MB
    • Maximum size: 2GB
    • Enable semantics
    • Same as that of multipart_threshold
  • multipart_threshold: The size threshold the plugin uses for multipart transfers of individual divided bulk-data. If enable_multi_part_upload is false, this parameter will be ignored. (string, default: 8MB)
    • Enable semantics
    • KB
    • MB
    • GB
    • TB
  • extension: File extension. (string, required)
  • enable_temp_file_output: If true, temp file will be created in temp_path directory. If false, bulk data will be treated on only buffer. (boolean, default: true)
  • temp_path: Directory for temp file output. (string, default: /tmp)
  • temp_file_prefix: Prefix of temp file name. (string, default: embulk-output-s3v2) ### Example #### Basic sample with IAMRole authentication yaml out: type: s3v2 region: ap-northeast-1 bucket: s3-bucket-name object_key_prefix: embulk/embulk-output-s3v2 temp_path: /tmp extension: .csv formatter: type: csv delimeter: "," #### Basic sample with Credentials-Profile authentication yaml out: type: s3v2 region: ap-northeast-1 bucket: s3-bucket-name object_key_prefix: embulk/embulk-output-s3v2 temp_path: /tmp enable_profile: true profile: default extension: .csv formatter: type: csv delimeter: "," #### Multipart Upload Sample with gzip encode yaml out: type: s3v2 region: ap-northeast-1 bucket: s3-bucket-name object_key_prefix: embulk/embulk-output-s3v2 temp_path: /tmp enable_multi_part_upload: true multipart_chunksize: 10MB max_concurrent_requests: 20 extension: csv.gz formatter: type: csv delimeter: "," encoders: - type: gzip level: 1

Usage

Build

$ ./gradlew gem