Class: Google::Cloud::Bigquery::Project

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/bigquery/project.rb,
lib/google/cloud/bigquery/project/list.rb

Overview

Project

Projects are top-level containers in Google Cloud Platform. They store information about billing and authorized users, and they contain BigQuery data. Each project has a friendly name and a unique ID.

Google::Cloud::Bigquery::Project is the main object for interacting with Google BigQuery. Dataset objects are created, accessed, and deleted by Google::Cloud::Bigquery::Project.

See Google::Cloud#bigquery.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

Defined Under Namespace

Classes: List

Instance Attribute Summary collapse

Data collapse

Instance Method Summary collapse

Constructor Details

#initialize(service) ⇒ Project

Creates a new Service instance.

See Google::Cloud.bigquery



66
67
68
# File 'lib/google/cloud/bigquery/project.rb', line 66

def initialize service
  @service = service
end

Instance Attribute Details

#nameString? (readonly)

The descriptive name of the project. Can only be present if the project was retrieved with #projects.



54
55
56
# File 'lib/google/cloud/bigquery/project.rb', line 54

def name
  @name
end

#numeric_idInteger? (readonly)

The numeric ID of the project. Can only be present if the project was retrieved with #projects.



54
55
56
# File 'lib/google/cloud/bigquery/project.rb', line 54

def numeric_id
  @numeric_id
end

Instance Method Details

#copy(source_table, destination_table, create: nil, write: nil, reservation: nil) {|job| ... } ⇒ Boolean

Copies the data from the source table to the destination table using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See #copy_job for the asynchronous version. Use this method instead of Table#copy to copy from source tables in other projects.

The geographic location for the job ("US", "EU", etc.) can be set via CopyJob::Updater#location= in a block passed to this method.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
destination_table = dataset.table "my_destination_table"

bigquery.copy "bigquery-public-data.samples.shakespeare",
              destination_table

Yields:

  • (job)

    a job configuration object

Yield Parameters:



293
294
295
296
297
298
# File 'lib/google/cloud/bigquery/project.rb', line 293

def copy source_table, destination_table, create: nil, write: nil, reservation: nil, &block
  job = copy_job source_table, destination_table, create: create, write: write, reservation: reservation, &block
  job.wait_until_done!
  ensure_job_succeeded! job
  true
end

#copy_job(source_table, destination_table, create: nil, write: nil, job_id: nil, prefix: nil, labels: nil, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::CopyJob

Copies the data from the source table to the destination table using an asynchronous method. In this method, a CopyJob is immediately returned. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See #copy for the synchronous version. Use this method instead of Table#copy_job to copy from source tables in other projects.

The geographic location for the job ("US", "EU", etc.) can be set via CopyJob::Updater#location= in a block passed to this method.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
source_table_id = "bigquery-public-data.samples.shakespeare"
destination_table = dataset.table "my_destination_table"

copy_job = bigquery.copy_job source_table_id, destination_table

copy_job.wait_until_done!
copy_job.done? #=> true

Yields:

  • (job)

    a job configuration object

Yield Parameters:



208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
# File 'lib/google/cloud/bigquery/project.rb', line 208

def copy_job source_table, destination_table, create: nil, write: nil, job_id: nil, prefix: nil, labels: nil,
             reservation: nil
  ensure_service!
  options = { create: create, write: write, labels: labels, job_id: job_id, prefix: prefix,
    reservation: reservation }

  updater = CopyJob::Updater.from_options(
    service,
    Service.get_table_ref(source_table, default_ref: project_ref),
    Service.get_table_ref(destination_table, default_ref: project_ref),
    options
  )

  yield updater if block_given?

  job_gapi = updater.to_gapi
  gapi = service.copy_table job_gapi
  Job.from_gapi gapi, service
end

#create_dataset(dataset_id, name: nil, description: nil, expiration: nil, location: nil, access_policy_version: nil, dataset_view: nil) {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset

Creates a new dataset.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.create_dataset "my_dataset"

A name and description can be provided:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.create_dataset "my_dataset",
                                  name: "My Dataset",
                                  description: "This is my Dataset"

Or, configure access with a block: (See Dataset::Access)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.create_dataset "my_dataset" do |dataset|
  dataset.access.add_writer_user "[email protected]"
end

Yields:

  • (access)

    a block for setting rules

Yield Parameters:



1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
# File 'lib/google/cloud/bigquery/project.rb', line 1651

def create_dataset dataset_id, name: nil, description: nil,
                   expiration: nil, location: nil, access_policy_version: nil,
                   dataset_view: nil
  ensure_service!

  new_ds = Google::Apis::BigqueryV2::Dataset.new(
    dataset_reference: Google::Apis::BigqueryV2::DatasetReference.new(
      project_id: project, dataset_id: dataset_id
    )
  )

  # Can set location only on creation, no Dataset#location method
  new_ds.update! location: location unless location.nil?

  updater = Dataset::Updater.new(new_ds).tap do |b|
    b.name = name unless name.nil?
    b.description = description unless description.nil?
    b.default_expiration = expiration unless expiration.nil?
  end

  if block_given?
    yield updater
    updater.check_for_mutated_access!
  end

  gapi = service.insert_dataset new_ds, access_policy_version: access_policy_version
  Dataset.from_gapi gapi, service, access_policy_version: access_policy_version, dataset_view: dataset_view
end

#dataset(dataset_id, skip_lookup: nil, project_id: nil, access_policy_version: nil, dataset_view: nil) ⇒ Google::Cloud::Bigquery::Dataset?

Retrieves an existing dataset by ID.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
puts dataset.name
require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", project_id: "another_project"
puts dataset.name

Avoid retrieving the dataset resource with skip_lookup:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true


1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
# File 'lib/google/cloud/bigquery/project.rb', line 1573

def dataset dataset_id, skip_lookup: nil, project_id: nil, access_policy_version: nil, dataset_view: nil
  ensure_service!
  project_id ||= project
  return Dataset.new_reference project_id, dataset_id, service if skip_lookup
  gapi = service.get_project_dataset project_id, dataset_id, access_policy_version: access_policy_version,
dataset_view: dataset_view
  Dataset.from_gapi gapi, service, access_policy_version: access_policy_version, dataset_view: dataset_view
rescue Google::Cloud::NotFoundError
  nil
end

#datasets(all: nil, filter: nil, token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Dataset>

Retrieves the list of datasets belonging to the project.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

datasets = bigquery.datasets
datasets.each do |dataset|
  puts dataset.name
end

Retrieve hidden datasets with the all optional arg:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

all_datasets = bigquery.datasets all: true

Retrieve all datasets: (See Dataset::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

datasets = bigquery.datasets
datasets.all do |dataset|
  puts dataset.name
end


1724
1725
1726
1727
1728
# File 'lib/google/cloud/bigquery/project.rb', line 1724

def datasets all: nil, filter: nil, token: nil, max: nil
  ensure_service!
  gapi = service.list_datasets all: all, filter: filter, token: token, max: max
  Dataset::List.from_gapi gapi, service, all, filter, max
end

#encryption(kms_key: nil) ⇒ Google::Cloud::Bigquery::EncryptionConfiguration

Creates a new Bigquery::EncryptionConfiguration instance.

This method does not execute an API call. Use the encryption configuration to encrypt a table when creating one via Bigquery::Dataset#create_table, Bigquery::Dataset#load, Bigquery::Table#copy, or Bigquery::Project#query.

Examples:

Encrypt a new table

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d"
encrypt_config = bigquery.encryption kms_key: key_name

table = dataset.create_table "my_table" do |updater|
  updater.encryption = encrypt_config
end

Encrypt a load destination table

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d"
encrypt_config = bigquery.encryption kms_key: key_name
job = dataset.load_job "my_table", "gs://abc/file" do |job|
  job.encryption = encrypt_config
end

Encrypt a copy destination table

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d"
encrypt_config = bigquery.encryption kms_key: key_name
job = table.copy_job "my_dataset.new_table" do |job|
  job.encryption = encrypt_config
end

Encrypt a query destination table

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d"
encrypt_config = bigquery.encryption kms_key: key_name
job = bigquery.query_job "SELECT 1;" do |query|
  query.table = dataset.table "my_table", skip_lookup: true
  query.encryption = encrypt_config
end


2085
2086
2087
2088
2089
# File 'lib/google/cloud/bigquery/project.rb', line 2085

def encryption kms_key: nil
  encrypt_config = Bigquery::EncryptionConfiguration.new
  encrypt_config.kms_key = kms_key unless kms_key.nil?
  encrypt_config
end

#external(url, format: nil) {|ext| ... } ⇒ External::DataSource

Creates a new External::DataSource (or subclass) object that represents the external data source that can be queried from directly, even though the data is not stored in BigQuery. Instead of loading or streaming the data, this object references the external data source.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

data = bigquery.query "SELECT * FROM my_ext_table",
                      external: { my_ext_table: csv_table }

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Yields:

  • (ext)

See Also:



1510
1511
1512
1513
1514
# File 'lib/google/cloud/bigquery/project.rb', line 1510

def external url, format: nil
  ext = External.from_urls url, format
  yield ext if block_given?
  ext
end

#extract(source, extract_url, format: nil, compression: nil, delimiter: nil, header: nil, reservation: nil) {|job| ... } ⇒ Boolean

Extracts the data from a table or exports a model to Google Cloud Storage using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See #extract_job for the asynchronous version.

Use this method instead of Table#extract or Model#extract to extract data from source tables or models in other projects.

The geographic location for the job ("US", "EU", etc.) can be set via ExtractJob::Updater#location= in a block passed to this method.

Examples:

Export table data

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

bigquery.extract "bigquery-public-data.samples.shakespeare",
                 "gs://my-bucket/shakespeare.csv"

Export a model

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

bigquery.extract model, "gs://my-bucket/#{model.model_id}"

Yields:

  • (job)

    a job configuration object

Yield Parameters:

See Also:



2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
# File 'lib/google/cloud/bigquery/project.rb', line 2310

def extract source, extract_url, format: nil, compression: nil, delimiter: nil, header: nil, reservation: nil,
            &block
  job = extract_job source, extract_url,
                    format:      format,
                    compression: compression,
                    delimiter:   delimiter,
                    header:      header,
                    reservation: reservation,
                    &block
  job.wait_until_done!
  ensure_job_succeeded! job
  true
end

#extract_job(source, extract_url, format: nil, compression: nil, delimiter: nil, header: nil, job_id: nil, prefix: nil, labels: nil, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::ExtractJob

Extracts the data from a table or exports a model to Google Cloud Storage asynchronously, immediately returning an ExtractJob that can be used to track the progress of the export job. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling

Job#wait_until_done!. See #extract for the synchronous version.

Use this method instead of Table#extract_job or Model#extract_job to extract data from source tables or models in other projects.

The geographic location for the job ("US", "EU", etc.) can be set via ExtractJob::Updater#location= in a block passed to this method.

Examples:

Export table data

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

table_id = "bigquery-public-data.samples.shakespeare"
extract_job = bigquery.extract_job table_id, "gs://my-bucket/shakespeare.csv"
extract_job.wait_until_done!
extract_job.done? #=> true

Export a model

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

extract_job = bigquery.extract model, "gs://my-bucket/#{model.model_id}"

Yields:

  • (job)

    a job configuration object

Yield Parameters:

See Also:



2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
# File 'lib/google/cloud/bigquery/project.rb', line 2208

def extract_job source, extract_url, format: nil, compression: nil, delimiter: nil, header: nil, job_id: nil,
                prefix: nil, labels: nil, reservation: nil
  ensure_service!
  options = { format: format, compression: compression, delimiter: delimiter, header: header, job_id: job_id,
              prefix: prefix, labels: labels, reservation: reservation }
  source_ref = if source.respond_to? :model_ref
                 source.model_ref
               else
                 Service.get_table_ref source, default_ref: project_ref
               end

  updater = ExtractJob::Updater.from_options service, source_ref, extract_url, options

  yield updater if block_given?

  job_gapi = updater.to_gapi
  gapi = service.extract_table job_gapi
  Job.from_gapi gapi, service
end

#job(job_id, location: nil) ⇒ Google::Cloud::Bigquery::Job?

Retrieves an existing job by ID.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.job "my_job"


1747
1748
1749
1750
1751
1752
1753
# File 'lib/google/cloud/bigquery/project.rb', line 1747

def job job_id, location: nil
  ensure_service!
  gapi = service.get_job job_id, location: location
  Job.from_gapi gapi, service
rescue Google::Cloud::NotFoundError
  nil
end

#jobs(all: nil, token: nil, max: nil, filter: nil, min_created_at: nil, max_created_at: nil, parent_job: nil) ⇒ Array<Google::Cloud::Bigquery::Job>

Retrieves the list of jobs belonging to the project.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

jobs = bigquery.jobs
jobs.each do |job|
  # process job
end

Retrieve only running jobs using the filter optional arg:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

running_jobs = bigquery.jobs filter: "running"
running_jobs.each do |job|
  # process job
end

Retrieve only jobs created within provided times:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

two_days_ago = Time.now - 60*60*24*2
three_days_ago = Time.now - 60*60*24*3

jobs = bigquery.jobs min_created_at: three_days_ago,
                     max_created_at: two_days_ago
jobs.each do |job|
  # process job
end

Retrieve all jobs: (See Job::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

jobs = bigquery.jobs
jobs.all do |job|
  # process job
end

Retrieve child jobs by setting parent_job:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

multi_statement_sql = "-- Declare a variable to hold names as an array.\nDECLARE top_names ARRAY<STRING>;\n-- Build an array of the top 100 names from the year 2017.\nSET top_names = (\nSELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)\nFROM `bigquery-public-data.usa_names.usa_1910_current`\nWHERE year = 2017\n);\n-- Which names appear as words in Shakespeare's plays?\nSELECT\nname AS shakespeare_name\nFROM UNNEST(top_names) AS name\nWHERE name IN (\nSELECT word\nFROM `bigquery-public-data.samples.shakespeare`\n);\n"

job = bigquery.query_job multi_statement_sql

job.wait_until_done!

child_jobs = bigquery.jobs parent_job: job

child_jobs.each do |child_job|
  script_statistics = child_job.script_statistics
  puts script_statistics.evaluation_kind
  script_statistics.stack_frames.each do |stack_frame|
    puts stack_frame.text
  end
end


1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
# File 'lib/google/cloud/bigquery/project.rb', line 1866

def jobs all: nil,
         token: nil,
         max: nil,
         filter: nil,
         min_created_at: nil,
         max_created_at: nil,
         parent_job: nil
  ensure_service!
  parent_job = parent_job.job_id if parent_job.is_a? Job
  options = {
    parent_job_id: parent_job,
    all: all,
    token: token,
    max: max, filter: filter,
    min_created_at: min_created_at,
    max_created_at: max_created_at
  }
  gapi = service.list_jobs(**options)
  Job::List.from_gapi gapi, service, **options
end

#load(table_id, files, dataset_id: "_SESSION", format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, autodetect: nil, null_marker: nil, session_id: nil, date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil, preserve_ascii_control_characters: nil, reservation: nil) {|updater| ... } ⇒ Boolean

Loads data into the provided destination table using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See also #load_job.

For the source of the data, you can pass a google-cloud storage file path or a google-cloud-storage File instance. Or, you can upload a file directly. See Loading Data with a POST Request.

The geographic location for the job ("US", "EU", etc.) can be set via LoadJob::Updater#location= in a block passed to this method.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

gs_url = "gs://my-bucket/file-name.csv"
bigquery.load "my_new_table", gs_url, dataset_id: "my_dataset" do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Yields:

  • (updater)

    A block for setting the schema of the destination table and other options for the load job. The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup.

Yield Parameters:



1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
# File 'lib/google/cloud/bigquery/project.rb', line 1437

def load table_id, files, dataset_id: "_SESSION", format: nil, create: nil, write: nil,
         projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil,
         delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil,
         skip_leading: nil, schema: nil, autodetect: nil, null_marker: nil, session_id: nil,
         date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil,
         null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil,
         preserve_ascii_control_characters: nil, reservation: nil, &block
  job = load_job table_id, files, dataset_id: dataset_id,
                format: format, create: create, write: write, projection_fields: projection_fields,
                jagged_rows: jagged_rows, quoted_newlines: quoted_newlines, encoding: encoding,
                delimiter: delimiter, ignore_unknown: ignore_unknown, max_bad_records: max_bad_records,
                quote: quote, skip_leading: skip_leading, schema: schema, autodetect: autodetect,
                null_marker: null_marker, session_id: session_id, date_format: date_format,
                datetime_format: datetime_format, time_format: time_format,
                timestamp_format: timestamp_format, null_markers: null_markers,
                source_column_match: source_column_match, time_zone: time_zone,
                reference_file_schema_uri: reference_file_schema_uri,
                preserve_ascii_control_characters: preserve_ascii_control_characters, reservation: reservation,
                 &block

  job.wait_until_done!
  ensure_job_succeeded! job
  true
end

#load_job(table_id, files, dataset_id: nil, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil, null_marker: nil, dryrun: nil, create_session: nil, session_id: nil, project_id: nil, date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil, preserve_ascii_control_characters: nil, reservation: nil) {|updater| ... } ⇒ Google::Cloud::Bigquery::LoadJob

Loads data into the provided destination table using an asynchronous method. In this method, a LoadJob is immediately returned. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See also #load.

For the source of the data, you can pass a google-cloud storage file path or a google-cloud-storage File instance. Or, you can upload a file directly. See Loading Data with a POST Request.

The geographic location for the job ("US", "EU", etc.) can be set via LoadJob::Updater#location= in a block passed to this method.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

gs_url = "gs://my-bucket/file-name.csv"
load_job = bigquery.load_job "temp_table", gs_url, autodetect: true, create_session: true
load_job.wait_until_done!
session_id = load_job.statistics["sessionInfo"]["sessionId"]

Yields:

  • (updater)

    A block for setting the schema and other options for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup.

Yield Parameters:



1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
# File 'lib/google/cloud/bigquery/project.rb', line 1218

def load_job table_id, files, dataset_id: nil, format: nil, create: nil, write: nil,
             projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil,
             delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil,
             skip_leading: nil, schema: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil,
             null_marker: nil, dryrun: nil, create_session: nil, session_id: nil, project_id: nil,
             date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil,
             null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil,
             preserve_ascii_control_characters: nil, reservation: nil, &block
  ensure_service!
  dataset_id ||= "_SESSION" unless create_session.nil? && session_id.nil?
  session_dataset = dataset dataset_id, skip_lookup: true, project_id: project_id
  table = session_dataset.table table_id, skip_lookup: true
  table.load_job  files,
                  format: format, create: create, write: write, projection_fields: projection_fields,
                  jagged_rows: jagged_rows, quoted_newlines: quoted_newlines, encoding: encoding,
                  delimiter: delimiter, ignore_unknown: ignore_unknown,
                  max_bad_records: max_bad_records, quote: quote, skip_leading: skip_leading,
                  dryrun: dryrun, schema: schema, job_id: job_id, prefix: prefix, labels: labels,
                  autodetect: autodetect, null_marker: null_marker, create_session: create_session,
                  session_id: session_id, date_format: date_format, datetime_format: datetime_format,
                  time_format: time_format, timestamp_format: timestamp_format,
                  null_markers: null_markers, source_column_match: source_column_match,
                  time_zone: time_zone, reference_file_schema_uri: reference_file_schema_uri,
                  preserve_ascii_control_characters: preserve_ascii_control_characters,
                  reservation: reservation, &block
end

#project_idObject Also known as: project

The BigQuery project connected to.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new(
  project_id: "my-project",
  credentials: "/path/to/keyfile.json"
)

bigquery.project_id #=> "my-project"


92
93
94
# File 'lib/google/cloud/bigquery/project.rb', line 92

def project_id
  service.project
end

#projects(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Project>

Retrieves the list of all projects for which the currently authorized account has been granted any project role. The returned project instances share the same credentials as the project used to retrieve them, but lazily create a new API connection for interactions with the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

projects = bigquery.projects
projects.each do |project|
  puts project.name
  project.datasets.all.each do |dataset|
    puts dataset.name
  end
end

Retrieve all projects: (See Google::Cloud::Bigquery::Project::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

projects = bigquery.projects

projects.all do |project|
  puts project.name
  project.datasets.all.each do |dataset|
    puts dataset.name
  end
end


1928
1929
1930
1931
1932
# File 'lib/google/cloud/bigquery/project.rb', line 1928

def projects token: nil, max: nil
  ensure_service!
  gapi = service.list_projects token: token, max: max
  Project::List.from_gapi gapi, service, max
end

#query(query, params: nil, types: nil, external: nil, max: nil, cache: true, dataset: nil, project: nil, standard_sql: nil, legacy_sql: nil, session_id: nil, format_options_use_int64_timestamp: true, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::Data

Queries data and waits for the results. In this method, a QueryJob is created and its results are saved to a temporary table, then read from the table. Timeouts and transient errors are generally handled as needed to complete the query. When used for executing DDL/DML statements, this method does not return row data.

The geographic location for the job ("US", "EU", etc.) can be set via QueryJob::Updater#location= in a block passed to this method.

Examples:

Query using standard SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

sql = "SELECT name FROM `my_project.my_dataset.my_table`"
data = bigquery.query sql

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Query using legacy SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

sql = "SELECT name FROM [my_project:my_dataset.my_table]"
data = bigquery.query sql, legacy_sql: true

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Retrieve all rows: (See Data#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "SELECT name FROM `my_dataset.my_table`"

data.all do |row|
  puts row[:name]
end

Query using positional query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "SELECT name FROM `my_dataset.my_table` WHERE id = ?",
                      params: [1]

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Query using named query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "SELECT name FROM `my_dataset.my_table` WHERE id = @id",
                      params: { id: 1 }

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Query using named query parameters with types:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "SELECT name FROM `my_dataset.my_table` WHERE id IN UNNEST(@ids)",
                      params: { ids: [] },
                      types: { ids: [:INT64] }

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Execute a DDL statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "CREATE TABLE `my_dataset.my_table` (x INT64)"

table_ref = data.ddl_target_table # Or ddl_target_routine for CREATE/DROP FUNCTION/PROCEDURE

Execute a DML statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "UPDATE `my_dataset.my_table` SET x = x + 1 WHERE x IS NOT NULL"

puts data.num_dml_affected_rows

Query using external data source, set destination:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

data = bigquery.query "SELECT * FROM my_ext_table" do |query|
  query.external = { my_ext_table: csv_table }
  dataset = bigquery.dataset "my_dataset", skip_lookup: true
  query.table = dataset.table "my_table", skip_lookup: true
end

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Yields:

  • (job)

    a job configuration object

Yield Parameters:

See Also:



951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
# File 'lib/google/cloud/bigquery/project.rb', line 951

def query query,
          params: nil,
          types: nil,
          external: nil,
          max: nil,
          cache: true,
          dataset: nil,
          project: nil,
          standard_sql: nil,
          legacy_sql: nil,
          session_id: nil,
          format_options_use_int64_timestamp: true,
          reservation: nil,
          &block
  job = query_job query,
                  params: params,
                  types: types,
                  external: external,
                  cache: cache,
                  dataset: dataset,
                  project: project,
                  standard_sql: standard_sql,
                  legacy_sql: legacy_sql,
                  session_id: session_id,
                  reservation: reservation,
                  &block
  job.wait_until_done!

  if job.failed?
    begin
      # raise to activate ruby exception cause handling
      raise job.gapi_error
    rescue StandardError => e
      # wrap Google::Apis::Error with Google::Cloud::Error
      raise Google::Cloud::Error.from_error(e)
    end
  end

  job.data max: max, format_options_use_int64_timestamp: format_options_use_int64_timestamp
end

#query_job(query, params: nil, types: nil, external: nil, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, dryrun: nil, dataset: nil, project: nil, standard_sql: nil, legacy_sql: nil, large_results: nil, flatten: nil, maximum_billing_tier: nil, maximum_bytes_billed: nil, job_id: nil, prefix: nil, labels: nil, udfs: nil, create_session: nil, session_id: nil, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::QueryJob

Queries data by creating a query job.

The geographic location for the job ("US", "EU", etc.) can be set via QueryJob::Updater#location= in a block passed to this method.

Examples:

Query using standard SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM `my_project.my_dataset.my_table`"

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using legacy SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM [my_project:my_dataset.my_table]",
                         legacy_sql: true

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using positional query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM `my_dataset.my_table` WHERE id = ?",
                         params: [1]

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using named query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM `my_dataset.my_table` WHERE id = @id",
                         params: { id: 1 }

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using named query parameters with types:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM `my_dataset.my_table` WHERE id IN UNNEST(@ids)",
                         params: { ids: [] },
                         types: { ids: [:INT64] }

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Execute a DDL statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "CREATE TABLE`my_dataset.my_table` (x INT64)"

job.wait_until_done!
if !job.failed?
  table_ref = job.ddl_target_table # Or ddl_target_routine for CREATE/DROP FUNCTION/PROCEDURE
end

Execute a DML statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "UPDATE `my_dataset.my_table` SET x = x + 1 WHERE x IS NOT NULL"

job.wait_until_done!
if !job.failed?
  puts job.num_dml_affected_rows
end

Query using external data source, set destination:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

job = bigquery.query_job "SELECT * FROM my_ext_table" do |query|
  query.external = { my_ext_table: csv_table }
  dataset = bigquery.dataset "my_dataset", skip_lookup: true
  query.table = dataset.table "my_table", skip_lookup: true
end

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Yields:

  • (job)

    a job configuration object

Yield Parameters:



631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
# File 'lib/google/cloud/bigquery/project.rb', line 631

def query_job query,
              params: nil,
              types: nil,
              external: nil,
              priority: "INTERACTIVE",
              cache: true,
              table: nil,
              create: nil,
              write: nil,
              dryrun: nil,
              dataset: nil,
              project: nil,
              standard_sql: nil,
              legacy_sql: nil,
              large_results: nil,
              flatten: nil,
              maximum_billing_tier: nil,
              maximum_bytes_billed: nil,
              job_id: nil,
              prefix: nil,
              labels: nil,
              udfs: nil,
              create_session: nil,
              session_id: nil,
              reservation: nil
  ensure_service!
  project ||= self.project
  options = {
    params: params,
    types: types,
    external: external,
    priority: priority,
    cache: cache,
    table: table,
    create: create,
    write: write,
    dryrun: dryrun,
    dataset: dataset,
    project: project,
    standard_sql: standard_sql,
    legacy_sql: legacy_sql,
    large_results: large_results,
    flatten: flatten,
    maximum_billing_tier: maximum_billing_tier,
    maximum_bytes_billed: maximum_bytes_billed,
    job_id: job_id,
    prefix: prefix,
    labels: labels,
    udfs: udfs,
    create_session: create_session,
    session_id: session_id,
    reservation: reservation
  }

  updater = QueryJob::Updater.from_options service, query, options

  yield updater if block_given?

  gapi = service.query_job updater.to_gapi
  Job.from_gapi gapi, service
end

#schema {|schema| ... } ⇒ Google::Cloud::Bigquery::Schema

Creates a new schema instance. An optional block may be given to configure the schema, otherwise the schema is returned empty and may be configured directly.

The returned schema can be passed to Dataset#load using the schema option. However, for most use cases, the block yielded by Dataset#load is a more convenient way to configure the schema for the destination table.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

schema = bigquery.schema do |s|
  s.string "first_name", mode: :required
  s.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

dataset = bigquery.dataset "my_dataset"

gs_url = "gs://my-bucket/file-name.csv"
load_job = dataset.load_job "my_new_table", gs_url, schema: schema

Yields:

  • (schema)

    a block for setting the schema

Yield Parameters:

  • schema (Schema)

    the object accepting the schema



2014
2015
2016
2017
2018
# File 'lib/google/cloud/bigquery/project.rb', line 2014

def schema
  s = Schema.from_gapi
  yield s if block_given?
  s
end

#service_account_emailString

The email address of the service account for the project used to connect to BigQuery. (See also #project_id.)



103
104
105
# File 'lib/google/cloud/bigquery/project.rb', line 103

def 
   ||= service..email
end

#time(hour, minute, second) ⇒ Bigquery::Time

Creates a Bigquery::Time object to represent a time, independent of a specific date.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

fourpm = bigquery.time 16, 0, 0
data = bigquery.query "SELECT name FROM `my_dataset.my_table` WHERE time_of_date = @time",
                      params: { time: fourpm }

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Create Time with fractional seconds:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

precise_time = bigquery.time 16, 35, 15.376541
data = bigquery.query "SELECT name FROM `my_dataset.my_table` WHERE time_of_date >= @time",
                      params: { time: precise_time }

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?


1977
1978
1979
# File 'lib/google/cloud/bigquery/project.rb', line 1977

def time hour, minute, second
  Bigquery::Time.new "#{hour}:#{minute}:#{second}"
end

#universe_domainString

The universe domain the client is connected to



75
76
77
# File 'lib/google/cloud/bigquery/project.rb', line 75

def universe_domain
  service.universe_domain
end