Google Cloud Bigquery extract file input plugin for Embulk
development version.
Overview
- Plugin type: file input
- Resume supported: no
- Cleanup supported: yes
Detail
Read files stored in Google Cloud Storage, that exported from Google Cloud Bigquery's table or query result.
Maybe solution for very big data in bigquery.
If you set table config without query config, then just extract table to Google Cloud Storage.
If you set query config, then query result save to temp table and then extracted that temp table to Google Cloud Storage uri. see : https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.extract
Usage
Install plugin
embulk gem install embulk-input-bigquery_extract_files
- rubygem url : https://rubygems.org/profiles/jo8937
Configuration
- project: Google Cloud Platform (gcp) project id (string, required)
- json_keyfile: gcp service account's private key with json (string, required)
- gcs_uri: bigquery result saved uri. bucket and path names parsed from this uri. (string, required)
temp_local_path: extract files download directory in local machine (string, required)
dataset: target datasource dataset (string, default:
null)table: target datasource table. either query or table are required. (string, default:
null)query: target datasource query. either query or table are required. (string, default:
null)temp_dataset: if you use query param, query result saved here (string, default:
null)temp_table: if you use query param, query result saved here. if not set, plugin generate temp name (string, default:
null)use_legacy_sql: if you use query param, see : https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.useLegacySql (string, default:
false)cache: if you use query param, see : https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.useQueryCache (string, default:
true)create_disposition: if you use query param, see : https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.createDisposition (string, default:
CREATE_IF_NEEDED)write_disposition: if you use query param, see : https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.writeDisposition (string, default:
WRITE_APPEND)file_format: Table extract file format. see : https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.extract.destinationFormat (string, default:
CSV)compression: Table extract file compression setting. see : https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.extract.compression (string, default:
GZIP)temp_schema_file_path: bigquery result schema file for parser. (Optional) (string, default:
null)temp_schema_file_type: default is embulk's Schema object. (Optional) (string, default:
null)decoders: embulk java-file-input plugin's default attribute. see : http://www.embulk.org/docs/built-in.html#gzip-decoder-plugin
parser: embulk java-file-input plugin's default .attribute see : http://www.embulk.org/docs/built-in.html#csv-parser-plugin
Example
in:
type: bigquery_extract_files
project: googlecloudplatformproject
json_keyfile: gcp-service-account-private-key.json
dataset: target_dataset
#table: target_table
query: 'select a,b,c from target_table'
gcs_uri: gs://bucket/subdir
temp_dataset: temp_dataset
temp_local_path: C:\Temp
file_format: 'NEWLINE_DELIMITED_JSON'
compression: 'GZIP'
decoders:
- {type: gzip}
parser:
type: json
out:
type: stdout
Build
$ ./gradlew gem # -t to watch change of files and rebuild continuously