BigqueryMigration

BigqueryMigraiton is a tool or a ruby library to migrate (or alter) BigQuery table schema.

Requirements

  • Ruby >= 2.3.0

Installation

Add this line to your application's Gemfile:

gem 'bigquery_migration'

And then execute:

$ bundle

Or install it yourself as:

$ gem install bigquery_migration

Usage

Define your desired schema, this tool automatically detects differences with the target table, and takes care of adding columns, or dropping columns (actually, select & copy is issued), or changing types.

CLI

config.yml

bigquery: &bigquery
  json_keyfile: your-project-000.json
  dataset: your_dataset_name
  table: your_table_name
  # If your data is in a location other than the US or EU multi-region, you must specify the location
  # location: asia-northeast1

actions:
- action: create_dataset
  <<: *bigquery
- action: migrate_table
  <<: *bigquery
  columns:
    - { name: 'timestamp', type: 'TIMESTAMP' }
    - name: 'record'
      type: 'RECORD'
      fields:
        - { name: 'string', type: 'STRING' }
        - { name: 'integer', type: 'INTEGER' }

Run

$ bundle exec bq_migrate run config.yml # dry-run
$ bundle exec bq_migrate run config.yml --exec

Library

require 'bigquery_migration'

config = {
  json_keyfile: '/path/to/your-project-000.json',
  dataset: 'your_dataset_name',
  table: 'your_table_name',

  # If your data is in a location other than the US or EU multi-region, you must specify the location
  # location: asia-northeast1,
}
columns = [
  { name: 'string', type: 'STRING' },
  { name: 'record', type: 'RECORD', fields: [
    { name: 'integer', type: 'INTEGER' },
    { name: 'timestamp', type: 'TIMESTAMP' },
  ] }
]

migrator = BigqueryMigration.new(config)
migrator.migrate_table(columns: columns)
# migrator.migrate_table(schema_file: '/path/to/schema.json')

LIMITATIONS

There are serveral limitations because of BigQuery API limitations:

  • Can not handle mode: REPEATED columns
  • Can add only mode: NULLABLE columns
  • Columns become mode: NULLABLE after type changing
  • Will be charged because a query is issued (If only adding columns, it is not charged because it uses patch_table API)

This tool has an advantage that it is faster than reloading data entirely.

Further Details

Development

Run example:

Service Account

Prepare your service account json at example/your-project-000.json, then

$ bundle exec bq_migrate run example/example.yml # dry-run
$ bundle exec bq_migrate run example/example.yml --exec

OAuth

Install gcloud into your development environment:

curl https://sdk.cloud.google.com | bash
gcloud init
gcloud auth login
gcloud auth application-default login
gcloud config set project <GCP_PROJECT_NAME>

Make sure gcloud works

gcloud compute instances list

Run as:

$ bundle exec bq_migrate run example/application_default.yml # dry-run
$ bundle exec bq_migrate run example/application_default.yml --exec

Run test:

$ bundle exec rake test

To run tests which directly connects to BigQuery, prepare example/your-project-000.json, then

$ bundle exec rake test

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/sonots/bigquery_migration. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.