Class: Google::Cloud::Bigquery::Dataset

Inherits:

Object

Object
Google::Cloud::Bigquery::Dataset

show all

Defined in:: lib/google/cloud/bigquery/dataset.rb,
lib/google/cloud/bigquery/dataset/tag.rb,
lib/google/cloud/bigquery/dataset/list.rb,
lib/google/cloud/bigquery/dataset/access.rb

Overview

Dataset

Represents a Dataset. A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. A dataset is contained within a specific project.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.create_dataset "my_dataset",
                                  name: "My Dataset",
                                  description: "This is my Dataset"

Direct Known Subclasses

Updater

Defined Under Namespace

Classes: Access, List, Tag, Updater

Attributes collapse

#access {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset::Access
Retrieves the access rules for a Dataset.
#api_url ⇒ String^?
A URL that can be used to access the dataset using the REST API.
#created_at ⇒ Time^?
The time when this dataset was created.
#dataset_id ⇒ String
A unique ID for this dataset, without the project name.
#default_collation ⇒ String^?
The default collation of the dataset.
#default_collation=(new_default_collation) ⇒ Object
Updates the default collation of the dataset.
#default_encryption ⇒ EncryptionConfiguration^?
The EncryptionConfiguration object that represents the default encryption method for all tables and models in the dataset.
#default_encryption=(value) ⇒ Object
Set the EncryptionConfiguration object that represents the default encryption method for all tables and models in the dataset.
#default_expiration ⇒ Integer^?
The default lifetime of all tables in the dataset, in milliseconds.
#default_expiration=(new_default_expiration) ⇒ Object
Updates the default lifetime of all tables in the dataset, in milliseconds.
#description ⇒ String^?
A user-friendly description of the dataset.
#description=(new_description) ⇒ Object
Updates the user-friendly description of the dataset.
#etag ⇒ String^?
The ETag hash of the dataset.
#labels ⇒ Hash<String, String>^?
A hash of user-provided labels associated with this dataset.
#labels=(labels) ⇒ Object
Updates the hash of user-provided labels associated with this dataset.
#location ⇒ String^?
The geographic location where the dataset should reside.
#modified_at ⇒ Time^?
The date when this dataset or any of its tables was last modified.
#name ⇒ String^?
A descriptive name for the dataset.
#name=(new_name) ⇒ Object
Updates the descriptive name for the dataset.
#project_id ⇒ String
The ID of the project containing this dataset.
#storage_billing_model ⇒ String^?
Gets the Storage Billing Model for the dataset.
#storage_billing_model=(value) ⇒ Object
Sets the Storage Billing Model for the dataset.
#tags ⇒ Google::Cloud::Bigquery::Dataset::Tag
Retrieves the tags associated with this dataset.

Lifecycle collapse

#delete(force: nil) ⇒ Boolean
Permanently deletes the dataset.

Table collapse

#create_materialized_view(table_id, query, name: nil, description: nil, enable_refresh: nil, refresh_interval_ms: nil) ⇒ Google::Cloud::Bigquery::Table
Creates a new materialized view.
#create_table(table_id, name: nil, description: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::Table
Creates a new table.
#create_view(table_id, query, name: nil, description: nil, standard_sql: nil, legacy_sql: nil, udfs: nil) ⇒ Google::Cloud::Bigquery::Table
Creates a new view, which is a virtual table defined by the given SQL query.
#table(table_id, skip_lookup: nil, view: nil) ⇒ Google::Cloud::Bigquery::Table^?
Retrieves an existing table by ID.
#tables(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Table>
Retrieves the list of tables belonging to the dataset.

Model collapse

#model(model_id, skip_lookup: nil) ⇒ Google::Cloud::Bigquery::Model^?
Retrieves an existing model by ID.
#models(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Model>
Retrieves the list of models belonging to the dataset.

Routine collapse

#create_routine(routine_id) {|routine| ... } ⇒ Google::Cloud::Bigquery::Routine
Creates a new routine.
#routine(routine_id, skip_lookup: nil) ⇒ Google::Cloud::Bigquery::Routine^?
Retrieves an existing routine by ID.
#routines(token: nil, max: nil, filter: nil) ⇒ Array<Google::Cloud::Bigquery::Routine>
Retrieves the list of routines belonging to the dataset.

Data collapse

#build_access_entry(target_types: nil) ⇒ Google::Apis::BigqueryV2::DatasetAccessEntry
Build an object of type Google::Apis::BigqueryV2::DatasetAccessEntry from the self.
#exists?(force: false) ⇒ Boolean
Determines whether the dataset exists in the BigQuery service.
#external(url, format: nil) {|ext| ... } ⇒ External::DataSource
Creates a new External::DataSource (or subclass) object that represents the external data source that can be queried from directly, even though the data is not stored in BigQuery.
#insert(table_id, rows, insert_ids: nil, skip_invalid: nil, ignore_unknown: nil, autocreate: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::InsertResponse
Inserts data into the given table for near-immediate querying, without the need to complete a load operation before the data can appear in query results.
#insert_async(table_id, skip_invalid: nil, ignore_unknown: nil, max_bytes: 10_000_000, max_rows: 500, interval: 10, threads: 4, view: nil) {|response| ... } ⇒ Table::AsyncInserter
Create an asynchronous inserter object used to insert rows in batches.
#load(table_id, files, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, autodetect: nil, null_marker: nil, session_id: nil, date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil, preserve_ascii_control_characters: nil, reservation: nil) {|updater| ... } ⇒ Boolean
Loads data into the provided destination table using a synchronous method that blocks for a response.
#load_job(table_id, files, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil, null_marker: nil, dryrun: nil, create_session: nil, session_id: nil, date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil, preserve_ascii_control_characters: nil, reservation: nil) {|updater| ... } ⇒ Google::Cloud::Bigquery::LoadJob
Loads data into the provided destination table using an asynchronous method.
#query(query, params: nil, types: nil, external: nil, max: nil, cache: true, standard_sql: nil, legacy_sql: nil, session_id: nil, format_options_use_int64_timestamp: true, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::Data
Queries data and waits for the results.
#query_job(query, params: nil, types: nil, external: nil, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, dryrun: nil, standard_sql: nil, legacy_sql: nil, large_results: nil, flatten: nil, maximum_billing_tier: nil, maximum_bytes_billed: nil, job_id: nil, prefix: nil, labels: nil, udfs: nil, create_session: nil, session_id: nil, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::QueryJob
Queries data by creating a query job.
#reference? ⇒ Boolean
Whether the dataset was created without retrieving the resource representation from the BigQuery service.
#reload! ⇒ Google::Cloud::Bigquery::Dataset (also: #refresh!)
Reloads the dataset with current data from the BigQuery service.
#resource? ⇒ Boolean
Whether the dataset was created with a resource representation from the BigQuery service.
#resource_full? ⇒ Boolean
Whether the dataset was created with a full resource representation from the BigQuery service.
#resource_partial? ⇒ Boolean
Whether the dataset was created with a partial resource representation from the BigQuery service by retrieval through Project#datasets.

Instance Method Details

#access {|access| ... } ⇒ `Google::Cloud::Bigquery::Dataset::Access`

Retrieves the access rules for a Dataset. The rules can be updated when passing a block, see Access for all the methods available.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

access = dataset.access
access.writer_user? "[email protected]" #=> false

Manage the access rules by passing a block:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

dataset.access do |access|
  access.add_owner_group "[email protected]"
  access.add_writer_user "[email protected]"
  access.remove_writer_user "[email protected]"
  access.add_reader_special :all
  access.add_reader_view other_dataset_view_object
end

Yields:

(access) —
a block for setting rules

Yield Parameters:

access (Dataset::Access) —
the object accepting rules

Returns:

(Google::Cloud::Bigquery::Dataset::Access) —
The access object.

See Also:

BigQuery Access Control

# File 'lib/google/cloud/bigquery/dataset.rb', line 591

def access
  ensure_full_data!
  reload! unless resource_full?
  access_builder = Access.from_gapi @gapi
  if block_given?
    yield access_builder
    if access_builder.changed?
      @gapi.update! access: access_builder.to_gapi
      patch_gapi! :access
    end
  end
  access_builder.freeze
end

#api_url ⇒ `String`^?

A URL that can be used to access the dataset using the REST API.

Returns:

(String, nil) —
A REST URL for the resource, or nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/dataset.rb', line 213

def api_url
  return nil if reference?
  ensure_full_data!
  @gapi.self_link
end

#build_access_entry(target_types: nil) ⇒ `Google::Apis::BigqueryV2::DatasetAccessEntry`

Build an object of type Google::Apis::BigqueryV2::DatasetAccessEntry from the self.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
dataset_access_entry = dataset.access_entry target_types: ["VIEWS"]

Parameters:

target_types (Array<String>) (defaults to: nil) —
The list of target types within the dataset.

Returns:

(Google::Apis::BigqueryV2::DatasetAccessEntry) —
Returns a DatasetAccessEntry object.

# File 'lib/google/cloud/bigquery/dataset.rb', line 3056

def build_access_entry target_types: nil
  params = {
    dataset: dataset_ref,
    target_types: target_types
  }.compact
  Google::Apis::BigqueryV2::DatasetAccessEntry.new(**params)
end

#create_materialized_view(table_id, query, name: nil, description: nil, enable_refresh: nil, refresh_interval_ms: nil) ⇒ `Google::Cloud::Bigquery::Table`

Creates a new materialized view.

Materialized views are precomputed views that periodically cache results of a query for increased performance and efficiency. BigQuery leverages precomputed results from materialized views and whenever possible reads only delta changes from the base table to compute up-to-date results.

Queries that use materialized views are generally faster and consume less resources than queries that retrieve the same data only from the base table. Materialized views are helpful to significantly boost performance of workloads that have the characteristic of common and repeated queries.

For logical views, see #create_view.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

materialized_view = dataset.create_materialized_view "my_materialized_view",
                                                     "SELECT name, age FROM proj.dataset.users"

Automatic refresh can be disabled:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

materialized_view = dataset.create_materialized_view "my_materialized_view",
                                                     "SELECT name, age FROM proj.dataset.users",
                                                     enable_refresh: false

Parameters:

table_id (String) —
The ID of the materialized view table. The ID must contain only letters ([A-Za-z]), numbers ([0-9]), or underscores (_). The maximum length is 1,024 characters.
query (String) —
The query that BigQuery executes when the materialized view is referenced.
name (String) (defaults to: nil) —
A descriptive name for the table.
description (String) (defaults to: nil) —
A user-friendly description of the table.
enable_refresh (Boolean) (defaults to: nil) —
Enable automatic refresh of the materialized view when the base table is updated. Optional. The default value is true.
refresh_interval_ms (Integer) (defaults to: nil) —
The maximum frequency in milliseconds at which this materialized view will be refreshed. Optional. The default value is 1_800_000 (30 minutes).

Returns:

(Google::Cloud::Bigquery::Table) —
A new table object.

See Also:

Introduction to materialized views

# File 'lib/google/cloud/bigquery/dataset.rb', line 913

def create_materialized_view table_id,
                             query,
                             name: nil,
                             description: nil,
                             enable_refresh: nil,
                             refresh_interval_ms: nil
  new_view_opts = {
    table_reference:   Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id,
      dataset_id: dataset_id,
      table_id:   table_id
    ),
    friendly_name:     name,
    description:       description,
    materialized_view: Google::Apis::BigqueryV2::MaterializedViewDefinition.new(
      enable_refresh:      enable_refresh,
      query:               query,
      refresh_interval_ms: refresh_interval_ms
    )
  }.compact
  new_view = Google::Apis::BigqueryV2::Table.new(**new_view_opts)

  gapi = service.insert_table dataset_id, new_view
  Table.from_gapi gapi, service
end

#create_routine(routine_id) {|routine| ... } ⇒ `Google::Cloud::Bigquery::Routine`

Creates a new routine. The following attributes may be set in the yielded block: Routine::Updater#routine_type=, Routine::Updater#language=, Routine::Updater#arguments=, Routine::Updater#return_type=, Routine::Updater#imported_libraries=, Routine::Updater#body=, and Routine::Updater#description=.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

routine = dataset.create_routine "my_routine" do |r|
  r.routine_type = "SCALAR_FUNCTION"
  r.language = "SQL"
  r.arguments = [
    Google::Cloud::Bigquery::Argument.new(name: "x", data_type: "INT64")
  ]
  r.body = "x * 3"
  r.description = "My routine description"
end

puts routine.routine_id

Extended example:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
routine = dataset.create_routine "my_routine" do |r|
  r.routine_type = "SCALAR_FUNCTION"
  r.language = :SQL
  r.body = "(SELECT SUM(IF(elem.name = \"foo\",elem.val,null)) FROM UNNEST(arr) AS elem)"
  r.arguments = [
    Google::Cloud::Bigquery::Argument.new(
      name: "arr",
      argument_kind: "FIXED_TYPE",
      data_type: Google::Cloud::Bigquery::StandardSql::DataType.new(
        type_kind: "ARRAY",
        array_element_type: Google::Cloud::Bigquery::StandardSql::DataType.new(
          type_kind: "STRUCT",
          struct_type: Google::Cloud::Bigquery::StandardSql::StructType.new(
            fields: [
              Google::Cloud::Bigquery::StandardSql::Field.new(
                name: "name",
                type: Google::Cloud::Bigquery::StandardSql::DataType.new(type_kind: "STRING")
              ),
              Google::Cloud::Bigquery::StandardSql::Field.new(
                name: "val",
                type: Google::Cloud::Bigquery::StandardSql::DataType.new(type_kind: "INT64")
              )
            ]
          )
        )
      )
    )
  ]
end

Parameters:

routine_id (String) —
The ID of the routine. The ID must contain only letters ([A-Za-z]), numbers ([0-9]), or underscores (_). The maximum length is 256 characters.

Yields:

(routine) —
A block for setting properties on the routine.

Yield Parameters:

routine (Google::Cloud::Bigquery::Routine::Updater) —
An updater to set additional properties on the routine.

Returns:

(Google::Cloud::Bigquery::Routine) —
A new routine object.

# File 'lib/google/cloud/bigquery/dataset.rb', line 1185

def create_routine routine_id
  ensure_service!
  new_tb = Google::Apis::BigqueryV2::Routine.new(
    routine_reference: Google::Apis::BigqueryV2::RoutineReference.new(
      project_id: project_id, dataset_id: dataset_id, routine_id: routine_id
    )
  )
  updater = Routine::Updater.new new_tb

  yield updater if block_given?

  gapi = service.insert_routine dataset_id, updater.to_gapi
  Routine.from_gapi gapi, service
end

#create_table(table_id, name: nil, description: nil) {|table| ... } ⇒ `Google::Cloud::Bigquery::Table`

Creates a new table. If you are adapting existing code that was written for the Rest API , you can pass the table's schema as a hash (see example.)

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table"

You can also pass name and description options.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table",
                             name: "My Table",
                             description: "A description of table."

Or the table's schema can be configured with the block.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |t|
  t.schema.string "first_name", mode: :required
  t.schema.record "cities_lived", mode: :required do |s|
    s.string "place", mode: :required
    s.integer "number_of_years", mode: :required
  end
end

You can define the schema using a nested block.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |t|
  t.name = "My Table"
  t.description = "A description of my table."
  t.schema do |s|
    s.string "first_name", mode: :required
    s.record "cities_lived", mode: :repeated do |r|
      r.string "place", mode: :required
      r.integer "number_of_years", mode: :required
    end
  end
end

With time partitioning and clustering.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |t|
  t.schema do |schema|
    schema.timestamp "dob", mode: :required
    schema.string "first_name", mode: :required
    schema.string "last_name", mode: :required
  end
  t.time_partitioning_type  = "DAY"
  t.time_partitioning_field = "dob"
  t.clustering_fields = ["last_name", "first_name"]
end

With range partitioning.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |t|
  t.schema do |schema|
    schema.integer "my_table_id", mode: :required
    schema.string "my_table_data", mode: :required
  end
  t.range_partitioning_field = "my_table_id"
  t.range_partitioning_start = 0
  t.range_partitioning_interval = 10
  t.range_partitioning_end = 100
end

Parameters:

table_id (String) —
The ID of the table. The ID must contain only letters ([A-Za-z]), numbers ([0-9]), or underscores (_). The maximum length is 1,024 characters.
name (String) (defaults to: nil) —
A descriptive name for the table.
description (String) (defaults to: nil) —
A user-friendly description of the table.

Yields:

(table) —
a block for setting the table

Yield Parameters:

table (Google::Cloud::Bigquery::Table::Updater) —
An updater to set additional properties on the table in the API request to create it.

Returns:

(Google::Cloud::Bigquery::Table) —
A new table object.

# File 'lib/google/cloud/bigquery/dataset.rb', line 752

def create_table table_id, name: nil, description: nil
  ensure_service!
  new_tb = Google::Apis::BigqueryV2::Table.new(
    table_reference: Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id, dataset_id: dataset_id,
      table_id: table_id
    )
  )
  updater = Table::Updater.new(new_tb).tap do |tb|
    tb.name = name unless name.nil?
    tb.description = description unless description.nil?
  end

  yield updater if block_given?

  gapi = service.insert_table dataset_id, updater.to_gapi
  Table.from_gapi gapi, service
end

#create_view(table_id, query, name: nil, description: nil, standard_sql: nil, legacy_sql: nil, udfs: nil) ⇒ `Google::Cloud::Bigquery::Table`

Creates a new view, which is a virtual table defined by the given SQL query.

With BigQuery's logical views, the query that defines the view is re-executed every time the view is queried. Queries are billed according to the total amount of data in all table fields referenced directly or indirectly by the top-level query. (See Table#view? and Table#query.)

For materialized views, see #create_materialized_view.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

view = dataset.create_view "my_view",
                           "SELECT name, age FROM proj.dataset.users"

A name and description can be provided:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

view = dataset.create_view "my_view",
                           "SELECT name, age FROM proj.dataset.users",
                           name: "My View", description: "This is my view"

Parameters:

table_id (String) —
The ID of the view table. The ID must contain only letters ([A-Za-z]), numbers ([0-9]), or underscores (_). The maximum length is 1,024 characters.
query (String) —
The query that BigQuery executes when the view is referenced.
name (String) (defaults to: nil) —
A descriptive name for the table.
description (String) (defaults to: nil) —
A user-friendly description of the table.
standard_sql (Boolean) (defaults to: nil) —
Specifies whether to use BigQuery's standard SQL dialect. Optional. The default value is true.
legacy_sql (Boolean) (defaults to: nil) —
Specifies whether to use BigQuery's legacy SQL dialect. Optional. The default value is false.
udfs (Array<String>, String) (defaults to: nil) —
User-defined function resources used in a legacy SQL query. May be either a code resource to load from a Google Cloud Storage URI (gs://bucket/path), or an inline resource that contains code for a user-defined function (UDF). Providing an inline code resource is equivalent to providing a URI for a file containing the same code.

This parameter is used for defining User Defined Function (UDF) resources only when using legacy SQL. Users of standard SQL should leverage either DDL (e.g. CREATE [TEMPORARY] FUNCTION ...) or the Routines API to define UDF resources.

For additional information on migrating, see: Migrating to standard SQL - Differences in user-defined JavaScript functions

Returns:

(Google::Cloud::Bigquery::Table) —
A new table object.

See Also:

Creating views

# File 'lib/google/cloud/bigquery/dataset.rb', line 837

def create_view table_id,
                query,
                name: nil,
                description: nil,
                standard_sql: nil,
                legacy_sql: nil,
                udfs: nil
  use_legacy_sql = Convert.resolve_legacy_sql standard_sql, legacy_sql
  new_view_opts = {
    table_reference: Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id,
      dataset_id: dataset_id,
      table_id:   table_id
    ),
    friendly_name:   name,
    description:     description,
    view:            Google::Apis::BigqueryV2::ViewDefinition.new(
      query:                           query,
      use_legacy_sql:                  use_legacy_sql,
      user_defined_function_resources: udfs_gapi(udfs)
    )
  }.compact
  new_view = Google::Apis::BigqueryV2::Table.new(**new_view_opts)

  gapi = service.insert_table dataset_id, new_view
  Table.from_gapi gapi, service
end

#created_at ⇒ `Time`^?

The time when this dataset was created.

Returns:

(Time, nil) —
The creation time, or nil if not present or the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/dataset.rb', line 327

def created_at
  return nil if reference?
  ensure_full_data!
  Convert.millis_to_time @gapi.creation_time
end

#dataset_id ⇒ `String`

A unique ID for this dataset, without the project name.

Returns:

(String) —
The ID must contain only letters ([A-Za-z]), numbers ([0-9]), or underscores (_). The maximum length is 1,024 characters.

# File 'lib/google/cloud/bigquery/dataset.rb', line 133

def dataset_id
  return reference.dataset_id if reference?
  @gapi.dataset_reference.dataset_id
end

#default_collation ⇒ `String`^?

The default collation of the dataset.

Returns:

(String, nil) —
The default collation, or nil if not present or the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/dataset.rb', line 258

def default_collation
  return nil if reference?
  ensure_full_data!
  @gapi.default_collation
end

#default_collation=(new_default_collation) ⇒ `Object`

Updates the default collation of the dataset.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Parameters:

new_default_collation (String) —
The new default collation for the dataset.

# File 'lib/google/cloud/bigquery/dataset.rb', line 275

def default_collation= new_default_collation
  reload! unless resource_full?
  @gapi.update! default_collation: new_default_collation
  patch_gapi! :default_collation
end

#default_encryption ⇒ `EncryptionConfiguration`^?

The EncryptionConfiguration object that represents the default encryption method for all tables and models in the dataset. Once this property is set, all newly-created partitioned tables and models in the dataset will have their encryption set to this value, unless table creation request (or query) overrides it.

Present only if this dataset is using custom default encryption.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

encrypt_config = dataset.default_encryption

Returns:

(EncryptionConfiguration, nil) —
The default encryption configuration.

@!group Attributes

See Also:

Protecting Data with Cloud KMS Keys

# File 'lib/google/cloud/bigquery/dataset.rb', line 459

def default_encryption
  return nil if reference?
  ensure_full_data!
  return nil if @gapi.default_encryption_configuration.nil?
  EncryptionConfiguration.from_gapi(@gapi.default_encryption_configuration).freeze
end

#default_encryption=(value) ⇒ `Object`

Set the EncryptionConfiguration object that represents the default encryption method for all tables and models in the dataset. Once this property is set, all newly-created partitioned tables and models in the dataset will have their encryption set to this value, unless table creation request (or query) overrides it.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d"
encrypt_config = bigquery.encryption kms_key: key_name

dataset.default_encryption = encrypt_config

Parameters:

value (EncryptionConfiguration) —
The new encryption config.

See Also:

Protecting Data with Cloud KMS Keys

# File 'lib/google/cloud/bigquery/dataset.rb', line 495

def default_encryption= value
  ensure_full_data!
  @gapi.default_encryption_configuration = value.to_gapi
  patch_gapi! :default_encryption_configuration
end

#default_expiration ⇒ `Integer`^?

The default lifetime of all tables in the dataset, in milliseconds.

Returns:

(Integer, nil) —
The default table expiration in milliseconds, or nil if not present or the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/dataset.rb', line 290

def default_expiration
  return nil if reference?
  ensure_full_data!
  begin
    Integer @gapi.default_table_expiration_ms
  rescue StandardError
    nil
  end
end

#default_expiration=(new_default_expiration) ⇒ `Object`

Updates the default lifetime of all tables in the dataset, in milliseconds.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Parameters:

new_default_expiration (Integer) —
The new default table expiration in milliseconds.

# File 'lib/google/cloud/bigquery/dataset.rb', line 313

def default_expiration= new_default_expiration
  reload! unless resource_full?
  @gapi.update! default_table_expiration_ms: new_default_expiration
  patch_gapi! :default_table_expiration_ms
end

#delete(force: nil) ⇒ `Boolean`

Permanently deletes the dataset. The dataset must be empty before it can be deleted unless the force option is set to true.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

dataset.delete

Parameters:

force (Boolean) (defaults to: nil) —
If true, delete all the tables in the dataset. If false and the dataset contains tables, the request will fail. Default is false.

Returns:

(Boolean) —
Returns true if the dataset was deleted.

# File 'lib/google/cloud/bigquery/dataset.rb', line 640

def delete force: nil
  ensure_service!
  service.delete_dataset dataset_id, force
  # Set flag for #exists?
  @exists = false
  true
end

#description ⇒ `String`^?

A user-friendly description of the dataset.

Returns:

(String, nil) —
The description, or nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/dataset.rb', line 227

def description
  return nil if reference?
  ensure_full_data!
  @gapi.description
end

#description=(new_description) ⇒ `Object`

Updates the user-friendly description of the dataset.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Parameters:

new_description (String) —
The new description for the dataset.

# File 'lib/google/cloud/bigquery/dataset.rb', line 244

def description= new_description
  reload! unless resource_full?
  @gapi.update! description: new_description
  patch_gapi! :description
end

#etag ⇒ `String`^?

The ETag hash of the dataset.

Returns:

(String, nil) —
The ETag hash, or nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/dataset.rb', line 199

def etag
  return nil if reference?
  ensure_full_data!
  @gapi.etag
end

#exists?(force: false) ⇒ `Boolean`

Determines whether the dataset exists in the BigQuery service. The result is cached locally. To refresh state, set force to true.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true
dataset.exists? # true

Parameters:

force (Boolean) (defaults to: false) —
Force the latest resource representation to be retrieved from the BigQuery service when true. Otherwise the return value of this method will be memoized to reduce the number of API calls made to the BigQuery service. The default is false.

Returns:

(Boolean) —
true when the dataset exists in the BigQuery service, false otherwise.

# File 'lib/google/cloud/bigquery/dataset.rb', line 2691

def exists? force: false
  return gapi_exists? if force
  # If we have a memoized value, return it
  return @exists unless @exists.nil?
  # Always true if we have a gapi object
  return true if resource?
  gapi_exists?
end

#external(url, format: nil) {|ext| ... } ⇒ `External::DataSource`

Creates a new External::DataSource (or subclass) object that represents the external data source that can be queried from directly, even though the data is not stored in BigQuery. Instead of loading or streaming the data, this object references the external data source.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"

csv_url = "gs://bucket/path/to/data.csv"
csv_table = dataset.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

data = dataset.query "SELECT * FROM my_ext_table",
                      external: { my_ext_table: csv_table }

data.each do |row|
  puts row[:name]
end

Parameters:

url (String, Array<String>) —
The fully-qualified URL(s) that point to your data in Google Cloud. An attempt will be made to derive the format from the URLs provided.
format (String|Symbol) (defaults to: nil) —
The data format. This value will be used even if the provided URLs are recognized as a different format. Optional.

The following values are supported:
- csv - CSV
- json - Newline-delimited JSON
- avro - Avro
- sheets - Google Sheets
- datastore_backup - Cloud Datastore backup
- bigtable - Bigtable

Yields:

(ext)

Returns:

(External::DataSource) —
External data source.

See Also:

Querying External Data Sources

# File 'lib/google/cloud/bigquery/dataset.rb', line 2035

def external url, format: nil
  ext = External.from_urls url, format
  yield ext if block_given?
  ext
end

#insert(table_id, rows, insert_ids: nil, skip_invalid: nil, ignore_unknown: nil, autocreate: nil) {|table| ... } ⇒ `Google::Cloud::Bigquery::InsertResponse`

Inserts data into the given table for near-immediate querying, without the need to complete a load operation before the data can appear in query results.

Simple Ruby types are generally accepted per JSON rules, along with the following support for BigQuery's more complex types:

BigQuery	Ruby	Notes
`NUMERIC`	`BigDecimal`	`BigDecimal` values will be rounded to scale 9.
`BIGNUMERIC`	`String`	Pass as `String` to avoid rounding to scale 9.
`DATETIME`	`DateTime`	`DATETIME` does not support time zone.
`DATE`	`Date`
`GEOGRAPHY`	`String`
`JSON`	`String` (Stringified JSON)	String, as JSON does not have a schema to verify.
`TIMESTAMP`	`Time`
`TIME`	`Google::Cloud::BigQuery::Time`
`BYTES`	`File`, `IO`, `StringIO`, or similar
`ARRAY`	`Array`	Nested arrays, `nil` values are not supported.
`STRUCT`	`Hash`	Hash keys may be strings or symbols.

Because BigQuery's streaming API is designed for high insertion rates, modifications to the underlying table metadata are eventually consistent when interacting with the streaming system. In most cases metadata changes are propagated within minutes, but during this period API responses may reflect the inconsistent state of the table.

The value :skip can be provided to skip the generation of IDs for all rows, or to skip the generation of an ID for a specific row in the array.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
dataset.insert "my_table", rows

Avoid retrieving the dataset with `skip_lookup`:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
dataset.insert "my_table", rows

Using `autocreate` to create a new table if none exists.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
dataset.insert "my_table", rows, autocreate: true do |t|
  t.schema.string "first_name", mode: :required
  t.schema.integer "age", mode: :required
end

Pass `BIGNUMERIC` value as a string to avoid rounding to scale 9 in the conversion from `BigDecimal`:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

row = {
  "my_numeric" => BigDecimal("123456798.987654321"),
  "my_bignumeric" => "123456798.98765432100001" # BigDecimal would be rounded, use String instead!
}
dataset.insert "my_table", row

Parameters:

table_id (String) —
The ID of the destination table.
rows (Hash, Array<Hash>) —
A hash object or array of hash objects containing the data. Required. BigDecimal values will be rounded to scale 9 to conform with the BigQuery NUMERIC data type. To avoid rounding BIGNUMERIC type values with scale greater than 9, use String instead of BigDecimal.
insert_ids (Array<String|Symbol>, Symbol) (defaults to: nil) —
A unique ID for each row. BigQuery uses this property to detect duplicate insertion requests on a best-effort basis. For more information, see data consistency. Optional. If not provided, the client library will assign a UUID to each row before the request is sent.
skip_invalid (Boolean) (defaults to: nil) —
Insert all valid rows of a request, even if invalid rows exist. The default value is false, which causes the entire request to fail if any invalid rows exist.
ignore_unknown (Boolean) (defaults to: nil) —
Accept rows that contain values that do not match the schema. The unknown values are ignored. Default is false, which treats unknown values as errors.
autocreate (Boolean) (defaults to: nil) —
Specifies whether the method should create a new table with the given table_id, if no table is found for table_id. The default value is false.

Yields:

(table) —
a block for setting the table

Yield Parameters:

table (Google::Cloud::Bigquery::Table::Updater) —
An updater to set additional properties on the table in the API request to create it. Only used when autocreate is set and the table does not already exist.

Returns:

(Google::Cloud::Bigquery::InsertResponse) —
An insert response object.

Raises:

(ArgumentError)

See Also:

# File 'lib/google/cloud/bigquery/dataset.rb', line 2934

def insert table_id, rows, insert_ids: nil, skip_invalid: nil, ignore_unknown: nil, autocreate: nil, &block
  rows = [rows] if rows.is_a? Hash
  raise ArgumentError, "No rows provided" if rows.empty?

  insert_ids = Array.new(rows.count) { :skip } if insert_ids == :skip
  insert_ids = Array insert_ids
  if insert_ids.count.positive? && insert_ids.count != rows.count
    raise ArgumentError, "insert_ids must be the same size as rows"
  end

  if autocreate
    insert_data_with_autocreate table_id, rows, skip_invalid: skip_invalid, ignore_unknown: ignore_unknown,
                                                insert_ids: insert_ids, &block
  else
    insert_data table_id, rows, skip_invalid: skip_invalid, ignore_unknown: ignore_unknown,
                                insert_ids: insert_ids
  end
end

#insert_async(table_id, skip_invalid: nil, ignore_unknown: nil, max_bytes: 10_000_000, max_rows: 500, interval: 10, threads: 4, view: nil) {|response| ... } ⇒ `Table::AsyncInserter`

Create an asynchronous inserter object used to insert rows in batches.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
inserter = dataset.insert_async "my_table" do |result|
  if result.error?
    log_error result.error
  else
    log_insert "inserted #{result.insert_count} rows " \
      "with #{result.error_count} errors"
  end
end

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
inserter.insert rows

inserter.stop.wait!

Avoid retrieving transient stats of the table with while inserting :

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
inserter = dataset.insert_async("my_table", view: "basic") do |result|
  if result.error?
    log_error result.error
  else
    log_insert "inserted #{result.insert_count} rows " \
      "with #{result.error_count} errors"
  end
end

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
inserter.insert rows

inserter.stop.wait!

Parameters:

table_id (String) —
The ID of the table to insert rows into.
skip_invalid (Boolean) (defaults to: nil) —
Insert all valid rows of a request, even if invalid rows exist. The default value is false, which causes the entire request to fail if any invalid rows exist.
ignore_unknown (Boolean) (defaults to: nil) —
Accept rows that contain values that do not match the schema. The unknown values are ignored. Default is false, which treats unknown values as errors.
max_rows (Integer) (defaults to: 500) —
The maximum number of rows to be collected before the batch is published. Default is 500.
view (String) (defaults to: nil) —
Specifies the view that determines which table information is returned. By default, basic table information and storage statistics (STORAGE_STATS) are returned. Accepted values include :unspecified, :basic, :storage, and :full. For more information, see BigQuery Classes. The default value is the :unspecified view type.

Yields:

(response) —
the callback for when a batch of rows is inserted

Yield Parameters:

result (Table::AsyncInserter::Result) —
the result of the asynchronous insert

Returns:

(Table::AsyncInserter) —
Returns an inserter object.

# File 'lib/google/cloud/bigquery/dataset.rb', line 3027

def insert_async table_id, skip_invalid: nil, ignore_unknown: nil, max_bytes: 10_000_000, max_rows: 500,
                 interval: 10, threads: 4, view: nil, &block
  ensure_service!

  # Get table, don't use Dataset#table which handles NotFoundError
  gapi = service.get_project_table project_id, dataset_id, table_id, metadata_view: view
  table = Table.from_gapi gapi, service, metadata_view: view
  # Get the AsyncInserter from the table
  table.insert_async skip_invalid: skip_invalid,
                     ignore_unknown: ignore_unknown,
                     max_bytes: max_bytes, max_rows: max_rows,
                     interval: interval, threads: threads, &block
end

#labels ⇒ `Hash<String, String>`^?

A hash of user-provided labels associated with this dataset. Labels are used to organize and group datasets. See Using Labels.

The returned hash is frozen and changes are not allowed. Use #labels= to replace the entire hash.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

labels = dataset.labels
labels["department"] #=> "shipping"

Returns:

(Hash<String, String>, nil) —
A hash containing key/value pairs, or nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/dataset.rb', line 383

def labels
  return nil if reference?
  m = @gapi.labels
  m = m.to_h if m.respond_to? :to_h
  m.dup.freeze
end

#labels=(labels) ⇒ `Object`

Updates the hash of user-provided labels associated with this dataset. Labels are used to organize and group datasets. See Using Labels.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

dataset.labels = { "department" => "shipping" }

Parameters:

labels (Hash<String, String>) —
A hash containing key/value pairs.

The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.

# File 'lib/google/cloud/bigquery/dataset.rb', line 426

def labels= labels
  reload! unless resource_full?
  @gapi.labels = labels
  patch_gapi! :labels
end

#load(table_id, files, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, autodetect: nil, null_marker: nil, session_id: nil, date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil, preserve_ascii_control_characters: nil, reservation: nil) {|updater| ... } ⇒ `Boolean`

Loads data into the provided destination table using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See also #load_job.

For the source of the data, you can pass a google-cloud storage file path or a google-cloud-storage File instance. Or, you can upload a file directly. See Loading Data with a POST Request.

The geographic location for the job ("US", "EU", etc.) can be set via LoadJob::Updater#location= in a block passed to this method. If the dataset is a full resource representation (see #resource_full?), the location of the job will be automatically set to the location of the dataset.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

gs_url = "gs://my-bucket/file-name.csv"
dataset.load "my_new_table", gs_url do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Pass a google-cloud-storage `File` instance:

require "google/cloud/bigquery"
require "google/cloud/storage"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

storage = Google::Cloud::Storage.new
bucket = storage.bucket "my-bucket"
file = bucket.file "file-name.csv"
dataset.load "my_new_table", file do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Pass a list of google-cloud-storage files:

require "google/cloud/bigquery"
require "google/cloud/storage"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

storage = Google::Cloud::Storage.new
bucket = storage.bucket "my-bucket"
file = bucket.file "file-name.csv"
list = [file, "gs://my-bucket/file-name2.csv"]
dataset.load "my_new_table", list do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Upload a file directly:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

file = File.open "my_data.csv"
dataset.load "my_new_table", file do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Schema is not required with a Cloud Datastore backup:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

dataset.load "my_new_table",
             "gs://my-bucket/xxxx.kind_name.backup_info" do |j|
  j.format = "datastore_backup"
end

Parameters:

table_id (String) —
The destination table to load the data into.
files (File, Google::Cloud::Storage::File, String, URI, Array<Google::Cloud::Storage::File, String, URI>) —
A file or the URI of a Google Cloud Storage file, or an Array of those, containing data to load into the table.
format (String) (defaults to: nil) —
The exported file format. The default value is csv.

The following values are supported:
- csv - CSV
- json - Newline-delimited JSON
- avro - Avro
- orc - ORC
- parquet - Parquet
- datastore_backup - Cloud Datastore backup
create (String) (defaults to: nil) —
Specifies whether the job is allowed to create new tables. The default value is needed.

The following values are supported:
- needed - Create the table if it does not exist.
- never - The table must already exist. A 'notFound' error is raised if the table does not exist.
write (String) (defaults to: nil) —
Specifies how to handle data already present in the table. The default value is append.

The following values are supported:
- truncate - BigQuery overwrites the table data.
- append - BigQuery appends the data to the table.
- empty - An error will be returned if the table already contains data.
projection_fields (Array<String>) (defaults to: nil) —
If the format option is set to datastore_backup, indicates which entity properties to load from a Cloud Datastore backup. Property names are case sensitive and must be top-level properties. If not set, BigQuery loads all properties. If any named property isn't found in the Cloud Datastore backup, an invalid error is returned.
jagged_rows (Boolean) (defaults to: nil) —
Accept rows that are missing trailing optional columns. The missing values are treated as nulls. If false, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. Only applicable to CSV, ignored for other formats.
quoted_newlines (Boolean) (defaults to: nil) —
Indicates if BigQuery should allow quoted data sections that contain newline characters in a CSV file. The default value is false.
autodetect (Boolean) (defaults to: nil) —
Indicates if BigQuery should automatically infer the options and schema for CSV and JSON sources. The default value is false.
encoding (String) (defaults to: nil) —
The character encoding of the data. The supported values are UTF-8 or ISO-8859-1. The default value is UTF-8.
delimiter (String) (defaults to: nil) —
Specifices the separator for fields in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. Default is ,.
ignore_unknown (Boolean) (defaults to: nil) —
Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false.

The format property determines what BigQuery treats as an extra value:
- CSV: Trailing columns
- JSON: Named values that don't match any column names
max_bad_records (Integer) (defaults to: nil) —
The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid.
null_marker (String) (defaults to: nil) —
Specifies a string that represents a null value in a CSV file. For example, if you specify \N, BigQuery interprets \N as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.
quote (String) (defaults to: nil) —
The value that is used to quote data sections in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. The default value is a double-quote ". If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set the allowQuotedNewlines property to true.
skip_leading (Integer) (defaults to: nil) —
The number of rows at the top of a CSV file that BigQuery will skip when loading the data. The default value is 0. This property is useful if you have header rows in the file that should be skipped.
schema (Google::Cloud::Bigquery::Schema) (defaults to: nil) —
The schema for the destination table. Optional. The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup.

See Project#schema for the creation of the schema for use with this option. Also note that for most use cases, the block yielded by this method is a more convenient way to configure the schema.
session_id (string) (defaults to: nil) —
Session ID in which the load job must run.
date_format (String) (defaults to: nil) —
Format used to parse DATE values. Supports SQL-style format strings. See date and time formatting guide
datetime_format (String) (defaults to: nil) —
Format used to parse DATETIME values. Supports SQL-style format strings. See date and time formatting guide
time_format (String) (defaults to: nil) —
Format used to parse TIME values. Supports SQL-style format strings. See date and time formatting guide
timestamp_format (String) (defaults to: nil) —
Format used to parse TIMESTAMP values. Supports SQL-style format strings. See date and time formatting guide
null_markers (Array<String>) (defaults to: nil) —
A list of strings represented as SQL NULL value in a CSV file. null_marker and null_markers can't be set at the same time. If null_marker is set, null_markers has to be not set. If null_markers is set, null_marker has to be not set. If both null_marker and null_markers are set at the same time, a user error would be thrown. Any strings listed in null_markers, including empty string would be interpreted as SQL NULL. This applies to all column types.
source_column_match (String) (defaults to: nil) —
Controls the strategy used to match loaded columns to the schema. If not set, a sensible default is chosen based on how the schema is provided. If autodetect is used, then columns are matched by name. Otherwise, columns are matched by position. This is done to keep the behavior backward-compatible.

Acceptable values are:
- POSITION - matches by position. This assumes that the columns are ordered the same way as the schema.
- NAME - matches by name. This reads the header row as column names and reorders columns to match the field names in the schema.
time_zone (String) (defaults to: nil) —
The time zone used when parsing timestamp values.
reference_file_schema_uri (String) (defaults to: nil) —
The URI of the reference file with the reader schema. This file is only loaded if it is part of source URIs, but is not loaded otherwise. It is enabled for the following formats: AVRO, PARQUET, ORC.
preserve_ascii_control_characters (Boolean) (defaults to: nil) —
When source_format is set to CSV, indicates if the embedded ASCII control characters (the first 32 characters in the ASCII-table, from \x00 to \x1F) are preserved. By default, ASCII control characters are not preserved.
reservation (String) (defaults to: nil) —
The reservation that job would use. User can specify a reservation to execute the job. If reservation is not set, reservation is determined based on the rules defined by the reservation assignments. The expected format is projects/project/locations/location/reservations/reservation``.

Yields:

(updater) —
A block for setting the schema of the destination table and other options for the load job. The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup.

Yield Parameters:

updater (Google::Cloud::Bigquery::LoadJob::Updater) —
An updater to modify the load job and its schema.

Returns:

(Boolean) —
Returns true if the load job was successful.

# File 'lib/google/cloud/bigquery/dataset.rb', line 2623

def load table_id, files, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil,
         quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil,
         quote: nil, skip_leading: nil, schema: nil, autodetect: nil, null_marker: nil, session_id: nil,
         date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil,
         null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil,
         preserve_ascii_control_characters: nil,
         reservation: nil, &block
  job = load_job table_id, files,
                 format: format, create: create, write: write, projection_fields: projection_fields,
                 jagged_rows: jagged_rows, quoted_newlines: quoted_newlines, encoding: encoding,
                 delimiter: delimiter, ignore_unknown: ignore_unknown, max_bad_records: max_bad_records,
                 quote: quote, skip_leading: skip_leading, schema: schema, autodetect: autodetect,
                 null_marker: null_marker, session_id: session_id, date_format: date_format,
                 datetime_format: datetime_format, time_format: time_format, timestamp_format: timestamp_format,
                 null_markers: null_markers, source_column_match: source_column_match, time_zone: time_zone,
                 reference_file_schema_uri: reference_file_schema_uri,
                 preserve_ascii_control_characters: preserve_ascii_control_characters,
                 reservation: reservation, &block

  job.wait_until_done!
  ensure_job_succeeded! job
  true
end

#load_job(table_id, files, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil, null_marker: nil, dryrun: nil, create_session: nil, session_id: nil, date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil, preserve_ascii_control_characters: nil, reservation: nil) {|updater| ... } ⇒ `Google::Cloud::Bigquery::LoadJob`

Loads data into the provided destination table using an asynchronous method. In this method, a LoadJob is immediately returned. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See also #load.

For the source of the data, you can pass a google-cloud storage file path or a google-cloud-storage File instance. Or, you can upload a file directly. See Loading Data with a POST Request.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

gs_url = "gs://my-bucket/file-name.csv"
load_job = dataset.load_job "my_new_table", gs_url do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Pass a google-cloud-storage `File` instance:

require "google/cloud/bigquery"
require "google/cloud/storage"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

storage = Google::Cloud::Storage.new
bucket = storage.bucket "my-bucket"
file = bucket.file "file-name.csv"
load_job = dataset.load_job "my_new_table", file do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Pass a list of google-cloud-storage files:

require "google/cloud/bigquery"
require "google/cloud/storage"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

storage = Google::Cloud::Storage.new
bucket = storage.bucket "my-bucket"
file = bucket.file "file-name.csv"
list = [file, "gs://my-bucket/file-name2.csv"]
load_job = dataset.load_job "my_new_table", list do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Upload a file directly:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

file = File.open "my_data.csv"
load_job = dataset.load_job "my_new_table", file do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Schema is not required with a Cloud Datastore backup:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

load_job = dataset.load_job(
             "my_new_table",
             "gs://my-bucket/xxxx.kind_name.backup_info") do |j|
  j.format = "datastore_backup"
end

Parameters:

table_id (String) —
The destination table to load the data into.
files (File, Google::Cloud::Storage::File, String, URI, Array<Google::Cloud::Storage::File, String, URI>) —
A file or the URI of a Google Cloud Storage file, or an Array of those, containing data to load into the table.
format (String) (defaults to: nil) —
The exported file format. The default value is csv.

The following values are supported:
- csv - CSV
- json - Newline-delimited JSON
- avro - Avro
- orc - ORC
- parquet - Parquet
- datastore_backup - Cloud Datastore backup
create (String) (defaults to: nil) —
Specifies whether the job is allowed to create new tables. The default value is needed.

The following values are supported:
- needed - Create the table if it does not exist.
- never - The table must already exist. A 'notFound' error is raised if the table does not exist.
write (String) (defaults to: nil) —
Specifies how to handle data already present in the table. The default value is append.

The following values are supported:
- truncate - BigQuery overwrites the table data.
- append - BigQuery appends the data to the table.
- empty - An error will be returned if the table already contains data.
projection_fields (Array<String>) (defaults to: nil) —
If the format option is set to datastore_backup, indicates which entity properties to load from a Cloud Datastore backup. Property names are case sensitive and must be top-level properties. If not set, BigQuery loads all properties. If any named property isn't found in the Cloud Datastore backup, an invalid error is returned.
jagged_rows (Boolean) (defaults to: nil) —
Accept rows that are missing trailing optional columns. The missing values are treated as nulls. If false, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. Only applicable to CSV, ignored for other formats.
quoted_newlines (Boolean) (defaults to: nil) —
Indicates if BigQuery should allow quoted data sections that contain newline characters in a CSV file. The default value is false.
autodetect (Boolean) (defaults to: nil) —
Indicates if BigQuery should automatically infer the options and schema for CSV and JSON sources. The default value is false.
encoding (String) (defaults to: nil) —
The character encoding of the data. The supported values are UTF-8 or ISO-8859-1. The default value is UTF-8.
delimiter (String) (defaults to: nil) —
Specifices the separator for fields in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. Default is ,.
ignore_unknown (Boolean) (defaults to: nil) —
Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false.

The format property determines what BigQuery treats as an extra value:
- CSV: Trailing columns
- JSON: Named values that don't match any column names
max_bad_records (Integer) (defaults to: nil) —
The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid.
null_marker (String) (defaults to: nil) —
Specifies a string that represents a null value in a CSV file. For example, if you specify \N, BigQuery interprets \N as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.
quote (String) (defaults to: nil) —
The value that is used to quote data sections in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. The default value is a double-quote ". If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set the allowQuotedNewlines property to true.
skip_leading (Integer) (defaults to: nil) —
The number of rows at the top of a CSV file that BigQuery will skip when loading the data. The default value is 0. This property is useful if you have header rows in the file that should be skipped.
schema (Google::Cloud::Bigquery::Schema) (defaults to: nil) —
The schema for the destination table. Optional. The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup.

See Project#schema for the creation of the schema for use with this option. Also note that for most use cases, the block yielded by this method is a more convenient way to configure the schema.
job_id (String) (defaults to: nil) —
A user-defined ID for the load job. The ID must contain only letters ([A-Za-z]), numbers ([0-9]), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

See Generating a job ID.
prefix (String) (defaults to: nil) —
A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters ([A-Za-z]), numbers ([0-9]), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.
labels (Hash) (defaults to: nil) —
A hash of user-provided labels associated with the job. You can use these to organize and group your jobs.

The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.
dryrun (Boolean) (defaults to: nil) —
If set, don't actually run this job. Behavior is undefined however for non-query jobs and may result in an error. Deprecated.
create_session (Boolean) (defaults to: nil) —
If set to true a new session will be created and the load job will happen in the table created within that session. Note: This will work only for _SESSION dataset.
session_id (string) (defaults to: nil) —
Session ID in which the load job must run.
date_format (String) (defaults to: nil) —
Format used to parse DATE values. Supports SQL-style format strings. See date and time formatting guide
datetime_format (String) (defaults to: nil) —
Format used to parse DATETIME values. Supports SQL-style format strings. See date and time formatting guide
time_format (String) (defaults to: nil) —
Format used to parse TIME values. Supports SQL-style format strings. See date and time formatting guide
timestamp_format (String) (defaults to: nil) —
Format used to parse TIMESTAMP values. Supports SQL-style format strings. See date and time formatting guide
null_markers (Array<String>) (defaults to: nil) —
A list of strings represented as SQL NULL value in a CSV file. null_marker and null_markers can't be set at the same time. If null_marker is set, null_markers has to be not set. If null_markers is set, null_marker has to be not set. If both null_marker and null_markers are set at the same time, a user error would be thrown. Any strings listed in null_markers, including empty string would be interpreted as SQL NULL. This applies to all column types.
source_column_match (String) (defaults to: nil) —
Controls the strategy used to match loaded columns to the schema. If not set, a sensible default is chosen based on how the schema is provided. If autodetect is used, then columns are matched by name. Otherwise, columns are matched by position. This is done to keep the behavior backward-compatible.

Acceptable values are:
- POSITION - matches by position. This assumes that the columns are ordered the same way as the schema.
- NAME - matches by name. This reads the header row as column names and reorders columns to match the field names in the schema.
time_zone (String) (defaults to: nil) —
The time zone used when parsing timestamp values.
reference_file_schema_uri (String) (defaults to: nil) —
The URI of the reference file with the reader schema. This file is only loaded if it is part of source URIs, but is not loaded otherwise. It is enabled for the following formats: AVRO, PARQUET, ORC.
preserve_ascii_control_characters (Boolean) (defaults to: nil) —
When source_format is set to CSV, indicates if the embedded ASCII control characters (the first 32 characters in the ASCII-table, from \x00 to \x1F) are preserved. By default, ASCII control characters are not preserved.
reservation (String) (defaults to: nil) —
The reservation that job would use. User can specify a reservation to execute the job. If reservation is not set, reservation is determined based on the rules defined by the reservation assignments. The expected format is projects/project/locations/location/reservations/reservation``.

Yields:

(updater) —
A block for setting the schema and other options for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup.

Yield Parameters:

updater (Google::Cloud::Bigquery::LoadJob::Updater) —
An updater to modify the load job and its schema.

Returns:

(Google::Cloud::Bigquery::LoadJob) —
A new load job object.

# File 'lib/google/cloud/bigquery/dataset.rb', line 2337

def load_job table_id, files, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil,
             quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil,
             quote: nil, skip_leading: nil, schema: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil,
             null_marker: nil, dryrun: nil, create_session: nil, session_id: nil, date_format: nil,
             datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil,
             source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil,
             preserve_ascii_control_characters: nil, reservation: nil
  ensure_service!

  updater = load_job_updater table_id,
                             format: format, create: create, write: write, projection_fields: projection_fields,
                             jagged_rows: jagged_rows, quoted_newlines: quoted_newlines, encoding: encoding,
                             delimiter: delimiter, ignore_unknown: ignore_unknown,
                             max_bad_records: max_bad_records, quote: quote, skip_leading: skip_leading,
                             dryrun: dryrun, schema: schema, job_id: job_id, prefix: prefix, labels: labels,
                             autodetect: autodetect, null_marker: null_marker, create_session: create_session,
                             session_id: session_id, date_format: date_format, datetime_format: datetime_format,
                             time_format: time_format, timestamp_format: timestamp_format,
                             null_markers: null_markers, source_column_match: source_column_match,
                             time_zone: time_zone, reference_file_schema_uri: reference_file_schema_uri,
                             preserve_ascii_control_characters: preserve_ascii_control_characters,
                             reservation: reservation


  yield updater if block_given?

  load_local_or_uri files, updater
end

#location ⇒ `String`^?

The geographic location where the dataset should reside. Possible values include EU and US. The default value is US.

Returns:

(String, nil) —
The geographic location, or nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/dataset.rb', line 356

def location
  return nil if reference?
  @gapi.location
end

#model(model_id, skip_lookup: nil) ⇒ `Google::Cloud::Bigquery::Model`^?

Retrieves an existing model by ID.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

model = dataset.model "my_model"
puts model.model_id

Avoid retrieving the model resource with `skip_lookup`:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"

model = dataset.model "my_model", skip_lookup: true

Parameters:

model_id (String) —
The ID of a model.
skip_lookup (Boolean) (defaults to: nil) —
Optionally create just a local reference object without verifying that the resource exists on the BigQuery service. Calls made on this object will raise errors if the resource does not exist. Default is false. Optional.

Returns:

(Google::Cloud::Bigquery::Model, nil) —
Returns nil if the model does not exist.

# File 'lib/google/cloud/bigquery/dataset.rb', line 1066

def model model_id, skip_lookup: nil
  ensure_service!
  return Model.new_reference project_id, dataset_id, model_id, service if skip_lookup
  gapi = service.get_model dataset_id, model_id
  Model.from_gapi_json gapi, service
rescue Google::Cloud::NotFoundError
  nil
end

#models(token: nil, max: nil) ⇒ `Array<Google::Cloud::Bigquery::Model>`

Retrieves the list of models belonging to the dataset.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

models = dataset.models
models.each do |model|
  puts model.model_id
end

Retrieve all models: (See Model::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

models = dataset.models
models.all do |model|
  puts model.model_id
end

Parameters:

token (String) (defaults to: nil) —
A previously-returned page token representing part of the larger set of results to view.
max (Integer) (defaults to: nil) —
Maximum number of models to return.

Returns:

(Array<Google::Cloud::Bigquery::Model>) —
An array of models (See Model::List)

# File 'lib/google/cloud/bigquery/dataset.rb', line 1109

def models token: nil, max: nil
  ensure_service!
  gapi = service.list_models dataset_id, token: token, max: max
  Model::List.from_gapi gapi, service, dataset_id, max
end

#modified_at ⇒ `Time`^?

The date when this dataset or any of its tables was last modified.

Returns:

(Time, nil) —
The last modified time, or nil if not present or the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/dataset.rb', line 341

def modified_at
  return nil if reference?
  ensure_full_data!
  Convert.millis_to_time @gapi.last_modified_time
end

#name ⇒ `String`^?

A descriptive name for the dataset.

Returns:

(String, nil) —
The friendly name, or nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/dataset.rb', line 168

def name
  return nil if reference?
  @gapi.friendly_name
end

#name=(new_name) ⇒ `Object`

Updates the descriptive name for the dataset.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Parameters:

new_name (String) —
The new friendly name, or nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/dataset.rb', line 185

def name= new_name
  reload! unless resource_full?
  @gapi.update! friendly_name: new_name
  patch_gapi! :friendly_name
end

#project_id ⇒ `String`

The ID of the project containing this dataset.

Returns:

(String) —
The project ID.

# File 'lib/google/cloud/bigquery/dataset.rb', line 145

def project_id
  return reference.project_id if reference?
  @gapi.dataset_reference.project_id
end

#query(query, params: nil, types: nil, external: nil, max: nil, cache: true, standard_sql: nil, legacy_sql: nil, session_id: nil, format_options_use_int64_timestamp: true, reservation: nil) {|job| ... } ⇒ `Google::Cloud::Bigquery::Data`

Queries data and waits for the results. In this method, a QueryJob is created and its results are saved to a temporary table, then read from the table. Timeouts and transient errors are generally handled as needed to complete the query. When used for executing DDL/DML statements, this method does not return row data.

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

The geographic location for the job ("US", "EU", etc.) can be set via QueryJob::Updater#location= in a block passed to this method. If the dataset is a full resource representation (see #resource_full?), the location of the job will be automatically set to the location of the dataset.

Examples:

Query using standard SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "SELECT name FROM my_table"

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Query using legacy SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "SELECT name FROM my_table",
                     legacy_sql: true

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Query using positional query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "SELECT name FROM my_table WHERE id = ?",
                     params: [1]

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Query using named query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "SELECT name FROM my_table WHERE id = @id",
                     params: { id: 1 }

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Query using named query parameters with types:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "SELECT name FROM my_table WHERE id IN UNNEST(@ids)",
                     params: { ids: [] },
                     types: { ids: [:INT64] }

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Execute a DDL statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "CREATE TABLE my_table (x INT64)"

table_ref = data.ddl_target_table # Or ddl_target_routine for CREATE/DROP FUNCTION/PROCEDURE

Execute a DML statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "UPDATE my_table SET x = x + 1 WHERE x IS NOT NULL"

puts data.num_dml_affected_rows

Run query in a session:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "CREATE TEMPORARY TABLE temptable AS SELECT 17 as foo", create_session: true

job.wait_until_done!

session_id = job.session_id
data = dataset.query "SELECT * FROM temptable", session_id: session_id

Query using external data source, set destination:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

csv_url = "gs://bucket/path/to/data.csv"
csv_table = dataset.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

data = dataset.query "SELECT * FROM my_ext_table" do |query|
  query.external = { my_ext_table: csv_table }
  query.table = dataset.table "my_table", skip_lookup: true
end

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Parameters:

query (String) —
A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".

params (Array, Hash) (defaults to: nil) —

Standard SQL only. Used to pass query arguments when the query string contains either positional (?) or named (@myparam) query parameters. If value passed is an array ["foo"], the query must use positional query parameters. If value passed is a hash { myparam: "foo" }, the query must use named query parameters. When set, legacy_sql will automatically be set to false and standard_sql to true.

BigQuery types are converted from Ruby types as follows:

BigQuery	Ruby	Notes
`BOOL`	`true`/`false`
`INT64`	`Integer`
`FLOAT64`	`Float`
`NUMERIC`	`BigDecimal`	`BigDecimal` values will be rounded to scale 9.
`BIGNUMERIC`	`BigDecimal`	NOT AUTOMATIC: Must be mapped using `types`, below.
`STRING`	`String`
`DATETIME`	`DateTime`	`DATETIME` does not support time zone.
`DATE`	`Date`
`GEOGRAPHY`	`String` (WKT or GeoJSON)	NOT AUTOMATIC: Must be mapped using `types`, below.
`JSON`	`String` (Stringified JSON)	String, as JSON does not have a schema to verify.
`TIMESTAMP`	`Time`
`TIME`	`Google::Cloud::BigQuery::Time`
`BYTES`	`File`, `IO`, `StringIO`, or similar
`ARRAY`	`Array`	Nested arrays, `nil` values are not supported.
`STRUCT`	`Hash`	Hash keys may be strings or symbols.

See Data Types for an overview of each BigQuery data type, including allowed values. For the GEOGRAPHY type, see Working with BigQuery GIS data.

types (Array, Hash) (defaults to: nil) —
Standard SQL only. Types of the SQL parameters in params. It is not always possible to infer the right SQL type from a value in params. In these cases, types must be used to specify the SQL type for these values.

Arguments must match the value type passed to params. This must be an Array when the query uses positional query parameters. This must be an Hash when the query uses named query parameters. The values should be BigQuery type codes from the following list:
- :BOOL
- :INT64
- :FLOAT64
- :NUMERIC
- :BIGNUMERIC
- :STRING
- :DATETIME
- :DATE
- :GEOGRAPHY
- :JSON
- :TIMESTAMP
- :TIME
- :BYTES
- Array - Lists are specified by providing the type code in an array. For example, an array of integers are specified as [:INT64].
- Hash - Types for STRUCT values (Hash objects) are specified using a Hash object, where the keys match the params hash, and the values are the types value that matches the data.
Types are optional.
external (Hash<String|Symbol, External::DataSource>) (defaults to: nil) —
A Hash that represents the mapping of the external tables to the table names used in the SQL query. The hash keys are the table names, and the hash values are the external table objects. See #query.
max (Integer) (defaults to: nil) —
The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.
cache (Boolean) (defaults to: true) —
Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.
standard_sql (Boolean) (defaults to: nil) —
Specifies whether to use BigQuery's standard SQL dialect for this query. If set to true, the query will use standard SQL rather than the legacy SQL dialect. When set to true, the values of large_results and flatten are ignored; the query will be run as if large_results is true and flatten is false. Optional. The default value is true.
legacy_sql (Boolean) (defaults to: nil) —
Specifies whether to use BigQuery's legacy SQL dialect for this query. If set to false, the query will use BigQuery's standard SQL When set to false, the values of large_results and flatten are ignored; the query will be run as if large_results is true and flatten is false. Optional. The default value is false.
session_id (String) (defaults to: nil) —
The ID of an existing session. See the create_session param in #query_job and Job#session_id.
format_options_use_int64_timestamp (Boolean) (defaults to: true) —
Output timestamp as usec int64. Default is true.
reservation (String) (defaults to: nil) —
The reservation that job would use. User can specify a reservation to execute the job. If reservation is not set, reservation is determined based on the rules defined by the reservation assignments. The expected format is projects/project/locations/location/reservations/reservation``.

Yields:

(job) —
a job configuration object

Yield Parameters:

job (Google::Cloud::Bigquery::QueryJob::Updater) —
a job configuration object for setting additional options for the query.

Returns:

(Google::Cloud::Bigquery::Data) —
A new data object.

See Also:

Querying Data

# File 'lib/google/cloud/bigquery/dataset.rb', line 1960

def query query,
          params: nil,
          types: nil,
          external: nil,
          max: nil,
          cache: true,
          standard_sql: nil,
          legacy_sql: nil,
          session_id: nil,
          format_options_use_int64_timestamp: true,
          reservation: nil,
          &block
  job = query_job query,
                  params: params,
                  types: types,
                  external: external,
                  cache: cache,
                  standard_sql: standard_sql,
                  legacy_sql: legacy_sql,
                  session_id: session_id,
                  reservation: reservation,
                  &block
  job.wait_until_done!
  ensure_job_succeeded! job

  job.data max: max, format_options_use_int64_timestamp: format_options_use_int64_timestamp
end

#query_job(query, params: nil, types: nil, external: nil, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, dryrun: nil, standard_sql: nil, legacy_sql: nil, large_results: nil, flatten: nil, maximum_billing_tier: nil, maximum_bytes_billed: nil, job_id: nil, prefix: nil, labels: nil, udfs: nil, create_session: nil, session_id: nil, reservation: nil) {|job| ... } ⇒ `Google::Cloud::Bigquery::QueryJob`

Queries data by creating a query job.

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

Examples:

Query using standard SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "SELECT name FROM my_table"

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using legacy SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "SELECT name FROM my_table",
                        legacy_sql: true

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using positional query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "SELECT name FROM my_table WHERE id = ?",
                        params: [1]

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using named query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "SELECT name FROM my_table WHERE id = @id",
                        params: { id: 1 }

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using named query parameters with types:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "SELECT name FROM my_table WHERE id IN UNNEST(@ids)",
                        params: { ids: [] },
                        types: { ids: [:INT64] }

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Execute a DDL statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "CREATE TABLE my_table (x INT64)"

job.wait_until_done!
if !job.failed?
  table_ref = job.ddl_target_table # Or ddl_target_routine for CREATE/DROP FUNCTION/PROCEDURE
end

Execute a DML statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "UPDATE my_table SET x = x + 1 WHERE x IS NOT NULL"

job.wait_until_done!
if !job.failed?
  puts job.num_dml_affected_rows
end

Run query in a session:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "CREATE TEMPORARY TABLE temptable AS SELECT 17 as foo", create_session: true

job.wait_until_done!

session_id = job.session_id
data = dataset.query "SELECT * FROM temptable", session_id: session_id

Query using external data source, set destination:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

csv_url = "gs://bucket/path/to/data.csv"
csv_table = dataset.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

job = dataset.query_job "SELECT * FROM my_ext_table" do |query|
  query.external = { my_ext_table: csv_table }
  query.table = dataset.table "my_table", skip_lookup: true
end

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Parameters:

query (String) —
A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".

params (Array, Hash) (defaults to: nil) —

BigQuery types are converted from Ruby types as follows:

BigQuery	Ruby	Notes
`BOOL`	`true`/`false`
`INT64`	`Integer`
`FLOAT64`	`Float`
`NUMERIC`	`BigDecimal`	`BigDecimal` values will be rounded to scale 9.
`BIGNUMERIC`	`BigDecimal`	NOT AUTOMATIC: Must be mapped using `types`, below.
`STRING`	`String`
`DATETIME`	`DateTime`	`DATETIME` does not support time zone.
`DATE`	`Date`
`GEOGRAPHY`	`String` (WKT or GeoJSON)	NOT AUTOMATIC: Must be mapped using `types`, below.
`JSON`	`String` (Stringified JSON)	String, as JSON does not have a schema to verify.
`TIMESTAMP`	`Time`
`TIME`	`Google::Cloud::BigQuery::Time`
`BYTES`	`File`, `IO`, `StringIO`, or similar
`ARRAY`	`Array`	Nested arrays, `nil` values are not supported.
`STRUCT`	`Hash`	Hash keys may be strings or symbols.

See Data Types for an overview of each BigQuery data type, including allowed values. For the GEOGRAPHY type, see Working with BigQuery GIS data.

types (Array, Hash) (defaults to: nil) —
Standard SQL only. Types of the SQL parameters in params. It is not always possible to infer the right SQL type from a value in params. In these cases, types must be used to specify the SQL type for these values.

Arguments must match the value type passed to params. This must be an Array when the query uses positional query parameters. This must be an Hash when the query uses named query parameters. The values should be BigQuery type codes from the following list:
- :BOOL
- :INT64
- :FLOAT64
- :NUMERIC
- :BIGNUMERIC
- :STRING
- :DATETIME
- :DATE
- :GEOGRAPHY
- :JSON
- :TIMESTAMP
- :TIME
- :BYTES
- Array - Lists are specified by providing the type code in an array. For example, an array of integers are specified as [:INT64].
- Hash - Types for STRUCT values (Hash objects) are specified using a Hash object, where the keys match the params hash, and the values are the types value that matches the data.
Types are optional.
external (Hash<String|Symbol, External::DataSource>) (defaults to: nil) —
A Hash that represents the mapping of the external tables to the table names used in the SQL query. The hash keys are the table names, and the hash values are the external table objects. See #query.
priority (String) (defaults to: "INTERACTIVE") —
Specifies a priority for the query. Possible values include INTERACTIVE and BATCH. The default value is INTERACTIVE.
cache (Boolean) (defaults to: true) —
Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.
table (Table) (defaults to: nil) —
The destination table where the query results should be stored. If not present, a new table will be created to store the results.
create (String) (defaults to: nil) —
Specifies whether the job is allowed to create new tables. The default value is needed.

The following values are supported:
- needed - Create the table if it does not exist.
- never - The table must already exist. A 'notFound' error is raised if the table does not exist.
write (String) (defaults to: nil) —
Specifies the action that occurs if the destination table already exists. The default value is empty.

The following values are supported:
- truncate - BigQuery overwrites the table data.
- append - BigQuery appends the data to the table.
- empty - A 'duplicate' error is returned in the job result if the table exists and contains data.
dryrun (Boolean) (defaults to: nil) —
If set to true, BigQuery doesn't run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value is false.
standard_sql (Boolean) (defaults to: nil) —
Specifies whether to use BigQuery's standard SQL dialect for this query. If set to true, the query will use standard SQL rather than the legacy SQL dialect. Optional. The default value is true.
legacy_sql (Boolean) (defaults to: nil) —
Specifies whether to use BigQuery's legacy SQL dialect for this query. If set to false, the query will use BigQuery's standard SQL dialect. Optional. The default value is false.
large_results (Boolean) (defaults to: nil) —
This option is specific to Legacy SQL. If true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requires table parameter to be set.
flatten (Boolean) (defaults to: nil) —
This option is specific to Legacy SQL. Flattens all nested and repeated fields in the query results. The default value is true. large_results parameter must be true if this is set to false.
maximum_billing_tier (Integer) (defaults to: nil) —
Limits the billing tier for this job. Queries that have resource usage beyond this tier will fail (without incurring a charge). WARNING: The billed byte amount can be multiplied by an amount up to this number! Most users should not need to alter this setting, and we recommend that you avoid introducing new uses of it. Deprecated.
maximum_bytes_billed (Integer) (defaults to: nil) —
Limits the bytes billed for this job. Queries that will have bytes billed beyond this limit will fail (without incurring a charge). Optional. If unspecified, this will be set to your project default.
job_id (String) (defaults to: nil) —
A user-defined ID for the query job. The ID must contain only letters ([A-Za-z]), numbers ([0-9]), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

See Generating a job ID.
prefix (String) (defaults to: nil) —
A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters ([A-Za-z]), numbers ([0-9]), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.
labels (Hash) (defaults to: nil) —
A hash of user-provided labels associated with the job. You can use these to organize and group your jobs.

The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.
udfs (Array<String>, String) (defaults to: nil) —
User-defined function resources used in a legacy SQL query. May be either a code resource to load from a Google Cloud Storage URI (gs://bucket/path), or an inline resource that contains code for a user-defined function (UDF). Providing an inline code resource is equivalent to providing a URI for a file containing the same code.

This parameter is used for defining User Defined Function (UDF) resources only when using legacy SQL. Users of standard SQL should leverage either DDL (e.g. CREATE [TEMPORARY] FUNCTION ...) or the Routines API to define UDF resources.

For additional information on migrating, see: Migrating to standard SQL - Differences in user-defined JavaScript functions
create_session (Boolean) (defaults to: nil) —
If true, creates a new session, where the session ID will be a server generated random id. If false, runs query with an existing session ID when one is provided in the session_id param, otherwise runs query in non-session mode. See Job#session_id. The default value is false.
session_id (String) (defaults to: nil) —
The ID of an existing session. See also the create_session param and Job#session_id.
reservation (String) (defaults to: nil) —
The reservation that job would use. User can specify a reservation to execute the job. If reservation is not set, reservation is determined based on the rules defined by the reservation assignments. The expected format is projects/project/locations/location/reservations/reservation``.

Yields:

(job) —
a job configuration object

Yield Parameters:

job (Google::Cloud::Bigquery::QueryJob::Updater) —
a job configuration object for setting additional options for the query.

Returns:

(Google::Cloud::Bigquery::QueryJob) —
A new query job object.

# File 'lib/google/cloud/bigquery/dataset.rb', line 1634

def query_job query,
              params: nil,
              types: nil,
              external: nil,
              priority: "INTERACTIVE",
              cache: true,
              table: nil,
              create: nil,
              write: nil,
              dryrun: nil,
              standard_sql: nil,
              legacy_sql: nil,
              large_results: nil,
              flatten: nil,
              maximum_billing_tier: nil,
              maximum_bytes_billed: nil,
              job_id: nil,
              prefix: nil,
              labels: nil,
              udfs: nil,
              create_session: nil,
              session_id: nil,
              reservation: nil
  ensure_service!
  options = {
    params: params,
    types: types,
    external: external,
    priority: priority,
    cache: cache,
    table: table,
    create: create,
    write: write,
    dryrun: dryrun,
    standard_sql: standard_sql,
    legacy_sql: legacy_sql,
    large_results: large_results,
    flatten: flatten,
    maximum_billing_tier: maximum_billing_tier,
    maximum_bytes_billed: maximum_bytes_billed,
    job_id: job_id,
    prefix: prefix,
    labels: labels,
    udfs: udfs,
    create_session: create_session,
    session_id: session_id,
    reservation: reservation
  }

  updater = QueryJob::Updater.from_options service, query, options
  updater.dataset = self
  updater.location = location if location # may be dataset reference

  yield updater if block_given?

  gapi = service.query_job updater.to_gapi
  Job.from_gapi gapi, service
end

#reference? ⇒ `Boolean`

Whether the dataset was created without retrieving the resource representation from the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true

dataset.reference? # true
dataset.reload!
dataset.reference? # false

Returns:

(Boolean) —
true when the dataset is just a local reference object, false otherwise.



2718
2719
2720

# File 'lib/google/cloud/bigquery/dataset.rb', line 2718

def reference?
  @gapi.nil?
end

#reload! ⇒ `Google::Cloud::Bigquery::Dataset` Also known as: refresh!

Reloads the dataset with current data from the BigQuery service.

Examples:

Skip retrieving the dataset from the service, then load it:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true
dataset.reload!

Returns:

(Google::Cloud::Bigquery::Dataset) —
Returns the reloaded dataset.

# File 'lib/google/cloud/bigquery/dataset.rb', line 2661

def reload!
  ensure_service!
  @gapi = service.get_project_dataset project_id, dataset_id, access_policy_version: @access_policy_version,
    dataset_view: @dataset_view
  @reference = nil
  @exists = nil
  self
end

#resource? ⇒ `Boolean`

Whether the dataset was created with a resource representation from the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true

dataset.resource? # false
dataset.reload!
dataset.resource? # true

Returns:

(Boolean) —
true when the dataset was created with a resource representation, false otherwise.



2740
2741
2742

# File 'lib/google/cloud/bigquery/dataset.rb', line 2740

def resource?
  !@gapi.nil?
end

#resource_full? ⇒ `Boolean`

Whether the dataset was created with a full resource representation from the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"

dataset.resource_full? # true

Returns:

(Boolean) —
true when the dataset was created with a full resource representation, false otherwise.



2787
2788
2789

# File 'lib/google/cloud/bigquery/dataset.rb', line 2787

def resource_full?
  @gapi.is_a? Google::Apis::BigqueryV2::Dataset
end

#resource_partial? ⇒ `Boolean`

Whether the dataset was created with a partial resource representation from the BigQuery service by retrieval through Project#datasets. See Datasets: list response for the contents of the partial representation. Accessing any attribute outside of the partial representation will result in loading the full representation.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.datasets.first

dataset.resource_partial? # true
dataset.description # Loads the full resource.
dataset.resource_partial? # false

Returns:

(Boolean) —
true when the dataset was created with a partial resource representation, false otherwise.



2767
2768
2769

# File 'lib/google/cloud/bigquery/dataset.rb', line 2767

def resource_partial?
  @gapi.is_a? Google::Apis::BigqueryV2::DatasetList::Dataset
end

#routine(routine_id, skip_lookup: nil) ⇒ `Google::Cloud::Bigquery::Routine`^?

Retrieves an existing routine by ID.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

routine = dataset.routine "my_routine"
puts routine.routine_id

Avoid retrieving the routine resource with `skip_lookup`:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"

routine = dataset.routine "my_routine", skip_lookup: true

Parameters:

routine_id (String) —
The ID of a routine.
skip_lookup (Boolean) (defaults to: nil) —
Optionally create just a local reference object without verifying that the resource exists on the BigQuery service. Calls made on this object will raise errors if the resource does not exist. Default is false. Optional.

Returns:

(Google::Cloud::Bigquery::Routine, nil) —
Returns nil if the routine does not exist.

# File 'lib/google/cloud/bigquery/dataset.rb', line 1232

def routine routine_id, skip_lookup: nil
  ensure_service!
  return Routine.new_reference project_id, dataset_id, routine_id, service if skip_lookup
  gapi = service.get_routine dataset_id, routine_id
  Routine.from_gapi gapi, service
rescue Google::Cloud::NotFoundError
  nil
end

#routines(token: nil, max: nil, filter: nil) ⇒ `Array<Google::Cloud::Bigquery::Routine>`

Retrieves the list of routines belonging to the dataset.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

routines = dataset.routines
routines.each do |routine|
  puts routine.routine_id
end

Retrieve all routines: (See Routine::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

routines = dataset.routines
routines.all do |routine|
  puts routine.routine_id
end

Parameters:

token (String) (defaults to: nil) —
A previously-returned page token representing part of the larger set of results to view.
max (Integer) (defaults to: nil) —
Maximum number of routines to return.
filter (String) (defaults to: nil) —
If set, then only the routines matching this filter are returned. The current supported form is routineType:, with a Routine#routine_type enum value. Example: routineType:SCALAR_FUNCTION.

Returns:

(Array<Google::Cloud::Bigquery::Routine>) —
An array of routines (See Routine::List)

# File 'lib/google/cloud/bigquery/dataset.rb', line 1277

def routines token: nil, max: nil, filter: nil
  ensure_service!
  gapi = service.list_routines dataset_id, token: token, max: max, filter: filter
  Routine::List.from_gapi gapi, service, dataset_id, max, filter: filter
end

#storage_billing_model ⇒ `String`^?

Gets the Storage Billing Model for the dataset.

Examples:


require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

storage_billing_model = dataset.storage_billing_model

Returns:

(String, nil) —
A string containing the storage billing model, or nil. Possible values of the string are LOGICAL, PHYSICAL. It returns nil if either the object is a reference (see #reference?), or if the storage billing model is unspecified.

See Also:

https://cloud.google.com/blog/products/data-analytics/new-bigquery-billing-model-helps-reduce-physical-storage-costs

# File 'lib/google/cloud/bigquery/dataset.rb', line 522

def storage_billing_model
  return nil if reference?
  ensure_full_data!
  @gapi.storage_billing_model
end

#storage_billing_model=(value) ⇒ `Object`

Sets the Storage Billing Model for the dataset.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
dataset.storage_billing_model = "LOGICAL"

Parameters:

value (String) —
The new storage billing model. Accepted values are LOGICAL and PHYSICAL.

See Also:

https://cloud.google.com/blog/products/data-analytics/new-bigquery-billing-model-helps-reduce-physical-storage-costs

# File 'lib/google/cloud/bigquery/dataset.rb', line 545

def storage_billing_model= value
  ensure_full_data!
  @gapi.storage_billing_model = value
  patch_gapi! :storage_billing_model
end

#table(table_id, skip_lookup: nil, view: nil) ⇒ `Google::Cloud::Bigquery::Table`^?

Retrieves an existing table by ID.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.table "my_table"
puts table.name

Avoid retrieving the table resource with `skip_lookup`:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"

table = dataset.table "my_table", skip_lookup: true

Avoid retrieving transient stats of the table with `view`:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"

table = dataset.table "my_table", view: "basic"

Parameters:

table_id (String) —
The ID of a table.
skip_lookup (Boolean) (defaults to: nil) —
Optionally create just a local reference object without verifying that the resource exists on the BigQuery service. Calls made on this object will raise errors if the resource does not exist. Default is false. Optional.
view (String) (defaults to: nil) —
Specifies the view that determines which table information is returned. By default, basic table information and storage statistics (STORAGE_STATS) are returned. Accepted values include :unspecified, :basic, :storage, and :full. For more information, see BigQuery Classes. The default value is the :unspecified view type.

Returns:

(Google::Cloud::Bigquery::Table, nil) —
Returns nil if the table does not exist.

# File 'lib/google/cloud/bigquery/dataset.rb', line 985

def table table_id, skip_lookup: nil, view: nil
  ensure_service!
  return Table.new_reference project_id, dataset_id, table_id, service if skip_lookup
  gapi = service.get_project_table project_id, dataset_id, table_id, metadata_view: view
  Table.from_gapi gapi, service, metadata_view: view
rescue Google::Cloud::NotFoundError
  nil
end

#tables(token: nil, max: nil) ⇒ `Array<Google::Cloud::Bigquery::Table>`

Retrieves the list of tables belonging to the dataset.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

tables = dataset.tables
tables.each do |table|
  puts table.name
end

Retrieve all tables: (See Table::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

tables = dataset.tables
tables.all do |table|
  puts table.name
end

Parameters:

token (String) (defaults to: nil) —
A previously-returned page token representing part of the larger set of results to view.
max (Integer) (defaults to: nil) —
Maximum number of tables to return.

Returns:

(Array<Google::Cloud::Bigquery::Table>) —
An array of tables (See Table::List)

# File 'lib/google/cloud/bigquery/dataset.rb', line 1028

def tables token: nil, max: nil
  ensure_service!
  gapi = service.list_tables dataset_id, token: token, max: max
  Table::List.from_gapi gapi, service, dataset_id, max
end

#tags ⇒ `Google::Cloud::Bigquery::Dataset::Tag`

Retrieves the tags associated with this dataset. Tag keys are globally unique, and managed via the resource manager API.

for more information.

Returns:

(Google::Cloud::Bigquery::Dataset::Tag) —
The list of tags.

See Also:

https://cloud.google.com/resource-manager/docs/tags/tags-overview

# File 'lib/google/cloud/bigquery/dataset.rb', line 614

def tags
  ensure_full_data!
  return nil if @gapi.tags.nil?
  @gapi.tags.map { |gapi| Tag.from_gapi gapi }
end

Class: Google::Cloud::Bigquery::Dataset

Overview

Dataset

Examples:

Direct Known Subclasses

Defined Under Namespace

Attributes collapse

Lifecycle collapse

Table collapse

Model collapse

Routine collapse

Data collapse

Instance Method Details

#access {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset::Access

Examples:

Manage the access rules by passing a block:

#api_url ⇒ String?

#build_access_entry(target_types: nil) ⇒ Google::Apis::BigqueryV2::DatasetAccessEntry

Examples:

#create_materialized_view(table_id, query, name: nil, description: nil, enable_refresh: nil, refresh_interval_ms: nil) ⇒ Google::Cloud::Bigquery::Table

Examples:

Automatic refresh can be disabled:

#create_routine(routine_id) {|routine| ... } ⇒ Google::Cloud::Bigquery::Routine

Examples:

Extended example:

#create_table(table_id, name: nil, description: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::Table

Examples:

You can also pass name and description options.

Or the table's schema can be configured with the block.

You can define the schema using a nested block.

With time partitioning and clustering.

With range partitioning.

#create_view(table_id, query, name: nil, description: nil, standard_sql: nil, legacy_sql: nil, udfs: nil) ⇒ Google::Cloud::Bigquery::Table

Examples:

A name and description can be provided:

#created_at ⇒ Time?

#dataset_id ⇒ String

#default_collation ⇒ String?

#default_collation=(new_default_collation) ⇒ Object

#default_encryption ⇒ EncryptionConfiguration?

Examples:

#default_encryption=(value) ⇒ Object

Examples:

#default_expiration ⇒ Integer?

#default_expiration=(new_default_expiration) ⇒ Object

#delete(force: nil) ⇒ Boolean

Examples:

#description ⇒ String?

#description=(new_description) ⇒ Object

#etag ⇒ String?

#exists?(force: false) ⇒ Boolean

Examples:

#external(url, format: nil) {|ext| ... } ⇒ External::DataSource

Examples:

#insert(table_id, rows, insert_ids: nil, skip_invalid: nil, ignore_unknown: nil, autocreate: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::InsertResponse

Examples:

Avoid retrieving the dataset with skip_lookup:

Using autocreate to create a new table if none exists.

Pass BIGNUMERIC value as a string to avoid rounding to scale 9 in the conversion from BigDecimal:

#insert_async(table_id, skip_invalid: nil, ignore_unknown: nil, max_bytes: 10_000_000, max_rows: 500, interval: 10, threads: 4, view: nil) {|response| ... } ⇒ Table::AsyncInserter

Examples:

Avoid retrieving transient stats of the table with while inserting :

#labels ⇒ Hash<String, String>?

Examples:

#labels=(labels) ⇒ Object

Examples:

Examples:

Pass a google-cloud-storage File instance:

Pass a list of google-cloud-storage files:

Upload a file directly:

Schema is not required with a Cloud Datastore backup:

Examples:

Pass a google-cloud-storage File instance:

Pass a list of google-cloud-storage files:

Upload a file directly:

Schema is not required with a Cloud Datastore backup:

#location ⇒ String?

#model(model_id, skip_lookup: nil) ⇒ Google::Cloud::Bigquery::Model?

Examples:

Avoid retrieving the model resource with skip_lookup:

#access {|access| ... } ⇒ `Google::Cloud::Bigquery::Dataset::Access`

#api_url ⇒ `String`^?

#build_access_entry(target_types: nil) ⇒ `Google::Apis::BigqueryV2::DatasetAccessEntry`

#create_materialized_view(table_id, query, name: nil, description: nil, enable_refresh: nil, refresh_interval_ms: nil) ⇒ `Google::Cloud::Bigquery::Table`

#create_routine(routine_id) {|routine| ... } ⇒ `Google::Cloud::Bigquery::Routine`

#create_table(table_id, name: nil, description: nil) {|table| ... } ⇒ `Google::Cloud::Bigquery::Table`

#create_view(table_id, query, name: nil, description: nil, standard_sql: nil, legacy_sql: nil, udfs: nil) ⇒ `Google::Cloud::Bigquery::Table`

#created_at ⇒ `Time`^?

#dataset_id ⇒ `String`

#default_collation ⇒ `String`^?

#default_collation=(new_default_collation) ⇒ `Object`

#default_encryption ⇒ `EncryptionConfiguration`^?

#default_encryption=(value) ⇒ `Object`

#default_expiration ⇒ `Integer`^?

#default_expiration=(new_default_expiration) ⇒ `Object`

#delete(force: nil) ⇒ `Boolean`

#description ⇒ `String`^?

#description=(new_description) ⇒ `Object`

#etag ⇒ `String`^?

#exists?(force: false) ⇒ `Boolean`

#external(url, format: nil) {|ext| ... } ⇒ `External::DataSource`

#insert(table_id, rows, insert_ids: nil, skip_invalid: nil, ignore_unknown: nil, autocreate: nil) {|table| ... } ⇒ `Google::Cloud::Bigquery::InsertResponse`

Avoid retrieving the dataset with `skip_lookup`:

Using `autocreate` to create a new table if none exists.

Pass `BIGNUMERIC` value as a string to avoid rounding to scale 9 in the conversion from `BigDecimal`:

#insert_async(table_id, skip_invalid: nil, ignore_unknown: nil, max_bytes: 10_000_000, max_rows: 500, interval: 10, threads: 4, view: nil) {|response| ... } ⇒ `Table::AsyncInserter`

#labels ⇒ `Hash<String, String>`^?

#labels=(labels) ⇒ `Object`

Pass a google-cloud-storage `File` instance:

Pass a google-cloud-storage `File` instance:

#location ⇒ `String`^?

#model(model_id, skip_lookup: nil) ⇒ `Google::Cloud::Bigquery::Model`^?

Avoid retrieving the model resource with `skip_lookup`:

#models(token: nil, max: nil) ⇒ `Array<Google::Cloud::Bigquery::Model>`

#modified_at ⇒ `Time`^?

#name ⇒ `String`^?

#name=(new_name) ⇒ `Object`

#project_id ⇒ `String`

#query(query, params: nil, types: nil, external: nil, max: nil, cache: true, standard_sql: nil, legacy_sql: nil, session_id: nil, format_options_use_int64_timestamp: true, reservation: nil) {|job| ... } ⇒ `Google::Cloud::Bigquery::Data`

#reference? ⇒ `Boolean`

#reload! ⇒ `Google::Cloud::Bigquery::Dataset` Also known as: refresh!

#resource? ⇒ `Boolean`

#resource_full? ⇒ `Boolean`

#resource_partial? ⇒ `Boolean`

#routine(routine_id, skip_lookup: nil) ⇒ `Google::Cloud::Bigquery::Routine`^?

Avoid retrieving the routine resource with `skip_lookup`:

#routines(token: nil, max: nil, filter: nil) ⇒ `Array<Google::Cloud::Bigquery::Routine>`

#storage_billing_model ⇒ `String`^?

#storage_billing_model=(value) ⇒ `Object`

#table(table_id, skip_lookup: nil, view: nil) ⇒ `Google::Cloud::Bigquery::Table`^?

Avoid retrieving the table resource with `skip_lookup`:

Avoid retrieving transient stats of the table with `view`:

#tables(token: nil, max: nil) ⇒ `Array<Google::Cloud::Bigquery::Table>`

#tags ⇒ `Google::Cloud::Bigquery::Dataset::Tag`