Class: Google::Cloud::Bigquery::Dataset

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/bigquery/dataset.rb,
lib/google/cloud/bigquery/dataset/tag.rb,
lib/google/cloud/bigquery/dataset/list.rb,
lib/google/cloud/bigquery/dataset/access.rb

Overview

Dataset

Represents a Dataset. A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. A dataset is contained within a specific project.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.create_dataset "my_dataset",
                                  name: "My Dataset",
                                  description: "This is my Dataset"

Direct Known Subclasses

Updater

Defined Under Namespace

Classes: Access, List, Tag, Updater

Attributes collapse

Lifecycle collapse

Table collapse

Model collapse

Routine collapse

Data collapse

Instance Method Details

#access {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset::Access

Retrieves the access rules for a Dataset. The rules can be updated when passing a block, see Access for all the methods available.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

access = dataset.access
access.writer_user? "[email protected]" #=> false

Manage the access rules by passing a block:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

dataset.access do |access|
  access.add_owner_group "[email protected]"
  access.add_writer_user "[email protected]"
  access.remove_writer_user "[email protected]"
  access.add_reader_special :all
  access.add_reader_view other_dataset_view_object
end

Yields:

  • (access)

    a block for setting rules

Yield Parameters:

See Also:



591
592
593
594
595
596
597
598
599
600
601
602
603
# File 'lib/google/cloud/bigquery/dataset.rb', line 591

def access
  ensure_full_data!
  reload! unless resource_full?
  access_builder = Access.from_gapi @gapi
  if block_given?
    yield access_builder
    if access_builder.changed?
      @gapi.update! access: access_builder.to_gapi
      patch_gapi! :access
    end
  end
  access_builder.freeze
end

#api_urlString?

A URL that can be used to access the dataset using the REST API.



213
214
215
216
217
# File 'lib/google/cloud/bigquery/dataset.rb', line 213

def api_url
  return nil if reference?
  ensure_full_data!
  @gapi.self_link
end

#build_access_entry(target_types: nil) ⇒ Google::Apis::BigqueryV2::DatasetAccessEntry

Build an object of type Google::Apis::BigqueryV2::DatasetAccessEntry from the self.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
dataset_access_entry = dataset.access_entry target_types: ["VIEWS"]


3056
3057
3058
3059
3060
3061
3062
# File 'lib/google/cloud/bigquery/dataset.rb', line 3056

def build_access_entry target_types: nil
  params = {
    dataset: dataset_ref,
    target_types: target_types
  }.compact
  Google::Apis::BigqueryV2::DatasetAccessEntry.new(**params)
end

#create_materialized_view(table_id, query, name: nil, description: nil, enable_refresh: nil, refresh_interval_ms: nil) ⇒ Google::Cloud::Bigquery::Table

Creates a new materialized view.

Materialized views are precomputed views that periodically cache results of a query for increased performance and efficiency. BigQuery leverages precomputed results from materialized views and whenever possible reads only delta changes from the base table to compute up-to-date results.

Queries that use materialized views are generally faster and consume less resources than queries that retrieve the same data only from the base table. Materialized views are helpful to significantly boost performance of workloads that have the characteristic of common and repeated queries.

For logical views, see #create_view.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

materialized_view = dataset.create_materialized_view "my_materialized_view",
                                                     "SELECT name, age FROM proj.dataset.users"

Automatic refresh can be disabled:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

materialized_view = dataset.create_materialized_view "my_materialized_view",
                                                     "SELECT name, age FROM proj.dataset.users",
                                                     enable_refresh: false

See Also:



913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
# File 'lib/google/cloud/bigquery/dataset.rb', line 913

def create_materialized_view table_id,
                             query,
                             name: nil,
                             description: nil,
                             enable_refresh: nil,
                             refresh_interval_ms: nil
  new_view_opts = {
    table_reference:   Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id,
      dataset_id: dataset_id,
      table_id:   table_id
    ),
    friendly_name:     name,
    description:       description,
    materialized_view: Google::Apis::BigqueryV2::MaterializedViewDefinition.new(
      enable_refresh:      enable_refresh,
      query:               query,
      refresh_interval_ms: refresh_interval_ms
    )
  }.compact
  new_view = Google::Apis::BigqueryV2::Table.new(**new_view_opts)

  gapi = service.insert_table dataset_id, new_view
  Table.from_gapi gapi, service
end

#create_routine(routine_id) {|routine| ... } ⇒ Google::Cloud::Bigquery::Routine

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

routine = dataset.create_routine "my_routine" do |r|
  r.routine_type = "SCALAR_FUNCTION"
  r.language = "SQL"
  r.arguments = [
    Google::Cloud::Bigquery::Argument.new(name: "x", data_type: "INT64")
  ]
  r.body = "x * 3"
  r.description = "My routine description"
end

puts routine.routine_id

Extended example:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
routine = dataset.create_routine "my_routine" do |r|
  r.routine_type = "SCALAR_FUNCTION"
  r.language = :SQL
  r.body = "(SELECT SUM(IF(elem.name = \"foo\",elem.val,null)) FROM UNNEST(arr) AS elem)"
  r.arguments = [
    Google::Cloud::Bigquery::Argument.new(
      name: "arr",
      argument_kind: "FIXED_TYPE",
      data_type: Google::Cloud::Bigquery::StandardSql::DataType.new(
        type_kind: "ARRAY",
        array_element_type: Google::Cloud::Bigquery::StandardSql::DataType.new(
          type_kind: "STRUCT",
          struct_type: Google::Cloud::Bigquery::StandardSql::StructType.new(
            fields: [
              Google::Cloud::Bigquery::StandardSql::Field.new(
                name: "name",
                type: Google::Cloud::Bigquery::StandardSql::DataType.new(type_kind: "STRING")
              ),
              Google::Cloud::Bigquery::StandardSql::Field.new(
                name: "val",
                type: Google::Cloud::Bigquery::StandardSql::DataType.new(type_kind: "INT64")
              )
            ]
          )
        )
      )
    )
  ]
end

Yields:

  • (routine)

    A block for setting properties on the routine.

Yield Parameters:



1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
# File 'lib/google/cloud/bigquery/dataset.rb', line 1185

def create_routine routine_id
  ensure_service!
  new_tb = Google::Apis::BigqueryV2::Routine.new(
    routine_reference: Google::Apis::BigqueryV2::RoutineReference.new(
      project_id: project_id, dataset_id: dataset_id, routine_id: routine_id
    )
  )
  updater = Routine::Updater.new new_tb

  yield updater if block_given?

  gapi = service.insert_routine dataset_id, updater.to_gapi
  Routine.from_gapi gapi, service
end

#create_table(table_id, name: nil, description: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::Table

Creates a new table. If you are adapting existing code that was written for the Rest API , you can pass the table's schema as a hash (see example.)

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table"

You can also pass name and description options.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table",
                             name: "My Table",
                             description: "A description of table."

Or the table's schema can be configured with the block.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |t|
  t.schema.string "first_name", mode: :required
  t.schema.record "cities_lived", mode: :required do |s|
    s.string "place", mode: :required
    s.integer "number_of_years", mode: :required
  end
end

You can define the schema using a nested block.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |t|
  t.name = "My Table"
  t.description = "A description of my table."
  t.schema do |s|
    s.string "first_name", mode: :required
    s.record "cities_lived", mode: :repeated do |r|
      r.string "place", mode: :required
      r.integer "number_of_years", mode: :required
    end
  end
end

With time partitioning and clustering.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |t|
  t.schema do |schema|
    schema.timestamp "dob", mode: :required
    schema.string "first_name", mode: :required
    schema.string "last_name", mode: :required
  end
  t.time_partitioning_type  = "DAY"
  t.time_partitioning_field = "dob"
  t.clustering_fields = ["last_name", "first_name"]
end

With range partitioning.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |t|
  t.schema do |schema|
    schema.integer "my_table_id", mode: :required
    schema.string "my_table_data", mode: :required
  end
  t.range_partitioning_field = "my_table_id"
  t.range_partitioning_start = 0
  t.range_partitioning_interval = 10
  t.range_partitioning_end = 100
end

Yields:

  • (table)

    a block for setting the table

Yield Parameters:



752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
# File 'lib/google/cloud/bigquery/dataset.rb', line 752

def create_table table_id, name: nil, description: nil
  ensure_service!
  new_tb = Google::Apis::BigqueryV2::Table.new(
    table_reference: Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id, dataset_id: dataset_id,
      table_id: table_id
    )
  )
  updater = Table::Updater.new(new_tb).tap do |tb|
    tb.name = name unless name.nil?
    tb.description = description unless description.nil?
  end

  yield updater if block_given?

  gapi = service.insert_table dataset_id, updater.to_gapi
  Table.from_gapi gapi, service
end

#create_view(table_id, query, name: nil, description: nil, standard_sql: nil, legacy_sql: nil, udfs: nil) ⇒ Google::Cloud::Bigquery::Table

Creates a new view, which is a virtual table defined by the given SQL query.

With BigQuery's logical views, the query that defines the view is re-executed every time the view is queried. Queries are billed according to the total amount of data in all table fields referenced directly or indirectly by the top-level query. (See Table#view? and Table#query.)

For materialized views, see #create_materialized_view.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

view = dataset.create_view "my_view",
                           "SELECT name, age FROM proj.dataset.users"

A name and description can be provided:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

view = dataset.create_view "my_view",
                           "SELECT name, age FROM proj.dataset.users",
                           name: "My View", description: "This is my view"

See Also:



837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
# File 'lib/google/cloud/bigquery/dataset.rb', line 837

def create_view table_id,
                query,
                name: nil,
                description: nil,
                standard_sql: nil,
                legacy_sql: nil,
                udfs: nil
  use_legacy_sql = Convert.resolve_legacy_sql standard_sql, legacy_sql
  new_view_opts = {
    table_reference: Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id,
      dataset_id: dataset_id,
      table_id:   table_id
    ),
    friendly_name:   name,
    description:     description,
    view:            Google::Apis::BigqueryV2::ViewDefinition.new(
      query:                           query,
      use_legacy_sql:                  use_legacy_sql,
      user_defined_function_resources: udfs_gapi(udfs)
    )
  }.compact
  new_view = Google::Apis::BigqueryV2::Table.new(**new_view_opts)

  gapi = service.insert_table dataset_id, new_view
  Table.from_gapi gapi, service
end

#created_atTime?

The time when this dataset was created.



327
328
329
330
331
# File 'lib/google/cloud/bigquery/dataset.rb', line 327

def created_at
  return nil if reference?
  ensure_full_data!
  Convert.millis_to_time @gapi.creation_time
end

#dataset_idString

A unique ID for this dataset, without the project name.



133
134
135
136
# File 'lib/google/cloud/bigquery/dataset.rb', line 133

def dataset_id
  return reference.dataset_id if reference?
  @gapi.dataset_reference.dataset_id
end

#default_collationString?

The default collation of the dataset.



258
259
260
261
262
# File 'lib/google/cloud/bigquery/dataset.rb', line 258

def default_collation
  return nil if reference?
  ensure_full_data!
  @gapi.default_collation
end

#default_collation=(new_default_collation) ⇒ Object

Updates the default collation of the dataset.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.



275
276
277
278
279
# File 'lib/google/cloud/bigquery/dataset.rb', line 275

def default_collation= new_default_collation
  reload! unless resource_full?
  @gapi.update! default_collation: new_default_collation
  patch_gapi! :default_collation
end

#default_encryptionEncryptionConfiguration?

The EncryptionConfiguration object that represents the default encryption method for all tables and models in the dataset. Once this property is set, all newly-created partitioned tables and models in the dataset will have their encryption set to this value, unless table creation request (or query) overrides it.

Present only if this dataset is using custom default encryption.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

encrypt_config = dataset.default_encryption

See Also:



459
460
461
462
463
464
# File 'lib/google/cloud/bigquery/dataset.rb', line 459

def default_encryption
  return nil if reference?
  ensure_full_data!
  return nil if @gapi.default_encryption_configuration.nil?
  EncryptionConfiguration.from_gapi(@gapi.default_encryption_configuration).freeze
end

#default_encryption=(value) ⇒ Object

Set the EncryptionConfiguration object that represents the default encryption method for all tables and models in the dataset. Once this property is set, all newly-created partitioned tables and models in the dataset will have their encryption set to this value, unless table creation request (or query) overrides it.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d"
encrypt_config = bigquery.encryption kms_key: key_name

dataset.default_encryption = encrypt_config

See Also:



495
496
497
498
499
# File 'lib/google/cloud/bigquery/dataset.rb', line 495

def default_encryption= value
  ensure_full_data!
  @gapi.default_encryption_configuration = value.to_gapi
  patch_gapi! :default_encryption_configuration
end

#default_expirationInteger?

The default lifetime of all tables in the dataset, in milliseconds.



290
291
292
293
294
295
296
297
298
# File 'lib/google/cloud/bigquery/dataset.rb', line 290

def default_expiration
  return nil if reference?
  ensure_full_data!
  begin
    Integer @gapi.default_table_expiration_ms
  rescue StandardError
    nil
  end
end

#default_expiration=(new_default_expiration) ⇒ Object

Updates the default lifetime of all tables in the dataset, in milliseconds.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.



313
314
315
316
317
# File 'lib/google/cloud/bigquery/dataset.rb', line 313

def default_expiration= new_default_expiration
  reload! unless resource_full?
  @gapi.update! default_table_expiration_ms: new_default_expiration
  patch_gapi! :default_table_expiration_ms
end

#delete(force: nil) ⇒ Boolean

Permanently deletes the dataset. The dataset must be empty before it can be deleted unless the force option is set to true.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

dataset.delete


640
641
642
643
644
645
646
# File 'lib/google/cloud/bigquery/dataset.rb', line 640

def delete force: nil
  ensure_service!
  service.delete_dataset dataset_id, force
  # Set flag for #exists?
  @exists = false
  true
end

#descriptionString?

A user-friendly description of the dataset.



227
228
229
230
231
# File 'lib/google/cloud/bigquery/dataset.rb', line 227

def description
  return nil if reference?
  ensure_full_data!
  @gapi.description
end

#description=(new_description) ⇒ Object

Updates the user-friendly description of the dataset.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.



244
245
246
247
248
# File 'lib/google/cloud/bigquery/dataset.rb', line 244

def description= new_description
  reload! unless resource_full?
  @gapi.update! description: new_description
  patch_gapi! :description
end

#etagString?

The ETag hash of the dataset.



199
200
201
202
203
# File 'lib/google/cloud/bigquery/dataset.rb', line 199

def etag
  return nil if reference?
  ensure_full_data!
  @gapi.etag
end

#exists?(force: false) ⇒ Boolean

Determines whether the dataset exists in the BigQuery service. The result is cached locally. To refresh state, set force to true.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true
dataset.exists? # true


2691
2692
2693
2694
2695
2696
2697
2698
# File 'lib/google/cloud/bigquery/dataset.rb', line 2691

def exists? force: false
  return gapi_exists? if force
  # If we have a memoized value, return it
  return @exists unless @exists.nil?
  # Always true if we have a gapi object
  return true if resource?
  gapi_exists?
end

#external(url, format: nil) {|ext| ... } ⇒ External::DataSource

Creates a new External::DataSource (or subclass) object that represents the external data source that can be queried from directly, even though the data is not stored in BigQuery. Instead of loading or streaming the data, this object references the external data source.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"

csv_url = "gs://bucket/path/to/data.csv"
csv_table = dataset.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

data = dataset.query "SELECT * FROM my_ext_table",
                      external: { my_ext_table: csv_table }

data.each do |row|
  puts row[:name]
end

Yields:

  • (ext)

See Also:



2035
2036
2037
2038
2039
# File 'lib/google/cloud/bigquery/dataset.rb', line 2035

def external url, format: nil
  ext = External.from_urls url, format
  yield ext if block_given?
  ext
end

#insert(table_id, rows, insert_ids: nil, skip_invalid: nil, ignore_unknown: nil, autocreate: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::InsertResponse

Inserts data into the given table for near-immediate querying, without the need to complete a load operation before the data can appear in query results.

Simple Ruby types are generally accepted per JSON rules, along with the following support for BigQuery's more complex types:

BigQuery Ruby Notes
NUMERIC BigDecimal BigDecimal values will be rounded to scale 9.
BIGNUMERIC String Pass as String to avoid rounding to scale 9.
DATETIME DateTime DATETIME does not support time zone.
DATE Date
GEOGRAPHY String
JSON String (Stringified JSON) String, as JSON does not have a schema to verify.
TIMESTAMP Time
TIME Google::Cloud::BigQuery::Time
BYTES File, IO, StringIO, or similar
ARRAY Array Nested arrays, nil values are not supported.
STRUCT Hash Hash keys may be strings or symbols.

Because BigQuery's streaming API is designed for high insertion rates, modifications to the underlying table metadata are eventually consistent when interacting with the streaming system. In most cases metadata changes are propagated within minutes, but during this period API responses may reflect the inconsistent state of the table.

The value :skip can be provided to skip the generation of IDs for all rows, or to skip the generation of an ID for a specific row in the array.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
dataset.insert "my_table", rows

Avoid retrieving the dataset with skip_lookup:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
dataset.insert "my_table", rows

Using autocreate to create a new table if none exists.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
dataset.insert "my_table", rows, autocreate: true do |t|
  t.schema.string "first_name", mode: :required
  t.schema.integer "age", mode: :required
end

Pass BIGNUMERIC value as a string to avoid rounding to scale 9 in the conversion from BigDecimal:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

row = {
  "my_numeric" => BigDecimal("123456798.987654321"),
  "my_bignumeric" => "123456798.98765432100001" # BigDecimal would be rounded, use String instead!
}
dataset.insert "my_table", row

Yields:

  • (table)

    a block for setting the table

Yield Parameters:

  • table (Google::Cloud::Bigquery::Table::Updater)

    An updater to set additional properties on the table in the API request to create it. Only used when autocreate is set and the table does not already exist.

Raises:

  • (ArgumentError)

See Also:



2934
2935
2936
2937
2938
2939
2940
2941
2942
2943
2944
2945
2946
2947
2948
2949
2950
2951
# File 'lib/google/cloud/bigquery/dataset.rb', line 2934

def insert table_id, rows, insert_ids: nil, skip_invalid: nil, ignore_unknown: nil, autocreate: nil, &block
  rows = [rows] if rows.is_a? Hash
  raise ArgumentError, "No rows provided" if rows.empty?

  insert_ids = Array.new(rows.count) { :skip } if insert_ids == :skip
  insert_ids = Array insert_ids
  if insert_ids.count.positive? && insert_ids.count != rows.count
    raise ArgumentError, "insert_ids must be the same size as rows"
  end

  if autocreate
    insert_data_with_autocreate table_id, rows, skip_invalid: skip_invalid, ignore_unknown: ignore_unknown,
                                                insert_ids: insert_ids, &block
  else
    insert_data table_id, rows, skip_invalid: skip_invalid, ignore_unknown: ignore_unknown,
                                insert_ids: insert_ids
  end
end

#insert_async(table_id, skip_invalid: nil, ignore_unknown: nil, max_bytes: 10_000_000, max_rows: 500, interval: 10, threads: 4, view: nil) {|response| ... } ⇒ Table::AsyncInserter

Create an asynchronous inserter object used to insert rows in batches.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
inserter = dataset.insert_async "my_table" do |result|
  if result.error?
    log_error result.error
  else
    log_insert "inserted #{result.insert_count} rows " \
      "with #{result.error_count} errors"
  end
end

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
inserter.insert rows

inserter.stop.wait!

Avoid retrieving transient stats of the table with while inserting :

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
inserter = dataset.insert_async("my_table", view: "basic") do |result|
  if result.error?
    log_error result.error
  else
    log_insert "inserted #{result.insert_count} rows " \
      "with #{result.error_count} errors"
  end
end

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
inserter.insert rows

inserter.stop.wait!

Yields:

  • (response)

    the callback for when a batch of rows is inserted

Yield Parameters:



3027
3028
3029
3030
3031
3032
3033
3034
3035
3036
3037
3038
3039
# File 'lib/google/cloud/bigquery/dataset.rb', line 3027

def insert_async table_id, skip_invalid: nil, ignore_unknown: nil, max_bytes: 10_000_000, max_rows: 500,
                 interval: 10, threads: 4, view: nil, &block
  ensure_service!

  # Get table, don't use Dataset#table which handles NotFoundError
  gapi = service.get_project_table project_id, dataset_id, table_id, metadata_view: view
  table = Table.from_gapi gapi, service, metadata_view: view
  # Get the AsyncInserter from the table
  table.insert_async skip_invalid: skip_invalid,
                     ignore_unknown: ignore_unknown,
                     max_bytes: max_bytes, max_rows: max_rows,
                     interval: interval, threads: threads, &block
end

#labelsHash<String, String>?

A hash of user-provided labels associated with this dataset. Labels are used to organize and group datasets. See Using Labels.

The returned hash is frozen and changes are not allowed. Use #labels= to replace the entire hash.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

labels = dataset.labels
labels["department"] #=> "shipping"


383
384
385
386
387
388
# File 'lib/google/cloud/bigquery/dataset.rb', line 383

def labels
  return nil if reference?
  m = @gapi.labels
  m = m.to_h if m.respond_to? :to_h
  m.dup.freeze
end

#labels=(labels) ⇒ Object

Updates the hash of user-provided labels associated with this dataset. Labels are used to organize and group datasets. See Using Labels.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

dataset.labels = { "department" => "shipping" }


426
427
428
429
430
# File 'lib/google/cloud/bigquery/dataset.rb', line 426

def labels= labels
  reload! unless resource_full?
  @gapi.labels = labels
  patch_gapi! :labels
end

#load(table_id, files, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, autodetect: nil, null_marker: nil, session_id: nil, date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil, preserve_ascii_control_characters: nil, reservation: nil) {|updater| ... } ⇒ Boolean

Loads data into the provided destination table using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See also #load_job.

For the source of the data, you can pass a google-cloud storage file path or a google-cloud-storage File instance. Or, you can upload a file directly. See Loading Data with a POST Request.

The geographic location for the job ("US", "EU", etc.) can be set via LoadJob::Updater#location= in a block passed to this method. If the dataset is a full resource representation (see #resource_full?), the location of the job will be automatically set to the location of the dataset.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

gs_url = "gs://my-bucket/file-name.csv"
dataset.load "my_new_table", gs_url do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Pass a google-cloud-storage File instance:

require "google/cloud/bigquery"
require "google/cloud/storage"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

storage = Google::Cloud::Storage.new
bucket = storage.bucket "my-bucket"
file = bucket.file "file-name.csv"
dataset.load "my_new_table", file do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Pass a list of google-cloud-storage files:

require "google/cloud/bigquery"
require "google/cloud/storage"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

storage = Google::Cloud::Storage.new
bucket = storage.bucket "my-bucket"
file = bucket.file "file-name.csv"
list = [file, "gs://my-bucket/file-name2.csv"]
dataset.load "my_new_table", list do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Upload a file directly:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

file = File.open "my_data.csv"
dataset.load "my_new_table", file do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Schema is not required with a Cloud Datastore backup:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

dataset.load "my_new_table",
             "gs://my-bucket/xxxx.kind_name.backup_info" do |j|
  j.format = "datastore_backup"
end

Yields:

  • (updater)

    A block for setting the schema of the destination table and other options for the load job. The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup.

Yield Parameters:



2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
# File 'lib/google/cloud/bigquery/dataset.rb', line 2623

def load table_id, files, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil,
         quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil,
         quote: nil, skip_leading: nil, schema: nil, autodetect: nil, null_marker: nil, session_id: nil,
         date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil,
         null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil,
         preserve_ascii_control_characters: nil,
         reservation: nil, &block
  job = load_job table_id, files,
                 format: format, create: create, write: write, projection_fields: projection_fields,
                 jagged_rows: jagged_rows, quoted_newlines: quoted_newlines, encoding: encoding,
                 delimiter: delimiter, ignore_unknown: ignore_unknown, max_bad_records: max_bad_records,
                 quote: quote, skip_leading: skip_leading, schema: schema, autodetect: autodetect,
                 null_marker: null_marker, session_id: session_id, date_format: date_format,
                 datetime_format: datetime_format, time_format: time_format, timestamp_format: timestamp_format,
                 null_markers: null_markers, source_column_match: source_column_match, time_zone: time_zone,
                 reference_file_schema_uri: reference_file_schema_uri,
                 preserve_ascii_control_characters: preserve_ascii_control_characters,
                 reservation: reservation, &block

  job.wait_until_done!
  ensure_job_succeeded! job
  true
end

#load_job(table_id, files, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil, null_marker: nil, dryrun: nil, create_session: nil, session_id: nil, date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil, preserve_ascii_control_characters: nil, reservation: nil) {|updater| ... } ⇒ Google::Cloud::Bigquery::LoadJob

Loads data into the provided destination table using an asynchronous method. In this method, a LoadJob is immediately returned. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See also #load.

For the source of the data, you can pass a google-cloud storage file path or a google-cloud-storage File instance. Or, you can upload a file directly. See Loading Data with a POST Request.

The geographic location for the job ("US", "EU", etc.) can be set via LoadJob::Updater#location= in a block passed to this method. If the dataset is a full resource representation (see #resource_full?), the location of the job will be automatically set to the location of the dataset.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

gs_url = "gs://my-bucket/file-name.csv"
load_job = dataset.load_job "my_new_table", gs_url do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Pass a google-cloud-storage File instance:

require "google/cloud/bigquery"
require "google/cloud/storage"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

storage = Google::Cloud::Storage.new
bucket = storage.bucket "my-bucket"
file = bucket.file "file-name.csv"
load_job = dataset.load_job "my_new_table", file do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Pass a list of google-cloud-storage files:

require "google/cloud/bigquery"
require "google/cloud/storage"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

storage = Google::Cloud::Storage.new
bucket = storage.bucket "my-bucket"
file = bucket.file "file-name.csv"
list = [file, "gs://my-bucket/file-name2.csv"]
load_job = dataset.load_job "my_new_table", list do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Upload a file directly:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

file = File.open "my_data.csv"
load_job = dataset.load_job "my_new_table", file do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Schema is not required with a Cloud Datastore backup:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

load_job = dataset.load_job(
             "my_new_table",
             "gs://my-bucket/xxxx.kind_name.backup_info") do |j|
  j.format = "datastore_backup"
end

Yields:

  • (updater)

    A block for setting the schema and other options for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup.

Yield Parameters:



2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
# File 'lib/google/cloud/bigquery/dataset.rb', line 2337

def load_job table_id, files, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil,
             quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil,
             quote: nil, skip_leading: nil, schema: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil,
             null_marker: nil, dryrun: nil, create_session: nil, session_id: nil, date_format: nil,
             datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil,
             source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil,
             preserve_ascii_control_characters: nil, reservation: nil
  ensure_service!

  updater = load_job_updater table_id,
                             format: format, create: create, write: write, projection_fields: projection_fields,
                             jagged_rows: jagged_rows, quoted_newlines: quoted_newlines, encoding: encoding,
                             delimiter: delimiter, ignore_unknown: ignore_unknown,
                             max_bad_records: max_bad_records, quote: quote, skip_leading: skip_leading,
                             dryrun: dryrun, schema: schema, job_id: job_id, prefix: prefix, labels: labels,
                             autodetect: autodetect, null_marker: null_marker, create_session: create_session,
                             session_id: session_id, date_format: date_format, datetime_format: datetime_format,
                             time_format: time_format, timestamp_format: timestamp_format,
                             null_markers: null_markers, source_column_match: source_column_match,
                             time_zone: time_zone, reference_file_schema_uri: reference_file_schema_uri,
                             preserve_ascii_control_characters: preserve_ascii_control_characters,
                             reservation: reservation


  yield updater if block_given?

  load_local_or_uri files, updater
end

#locationString?

The geographic location where the dataset should reside. Possible values include EU and US. The default value is US.



356
357
358
359
# File 'lib/google/cloud/bigquery/dataset.rb', line 356

def location
  return nil if reference?
  @gapi.location
end

#model(model_id, skip_lookup: nil) ⇒ Google::Cloud::Bigquery::Model?

Retrieves an existing model by ID.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

model = dataset.model "my_model"
puts model.model_id

Avoid retrieving the model resource with skip_lookup:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"

model = dataset.model "my_model", skip_lookup: true


1066
1067
1068
1069
1070
1071
1072
1073
# File 'lib/google/cloud/bigquery/dataset.rb', line 1066

def model model_id, skip_lookup: nil
  ensure_service!
  return Model.new_reference project_id, dataset_id, model_id, service if skip_lookup
  gapi = service.get_model dataset_id, model_id
  Model.from_gapi_json gapi, service
rescue Google::Cloud::NotFoundError
  nil
end

#models(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Model>

Retrieves the list of models belonging to the dataset.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

models = dataset.models
models.each do |model|
  puts model.model_id
end

Retrieve all models: (See Model::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

models = dataset.models
models.all do |model|
  puts model.model_id
end


1109
1110
1111
1112
1113
# File 'lib/google/cloud/bigquery/dataset.rb', line 1109

def models token: nil, max: nil
  ensure_service!
  gapi = service.list_models dataset_id, token: token, max: max
  Model::List.from_gapi gapi, service, dataset_id, max
end

#modified_atTime?

The date when this dataset or any of its tables was last modified.



341
342
343
344
345
# File 'lib/google/cloud/bigquery/dataset.rb', line 341

def modified_at
  return nil if reference?
  ensure_full_data!
  Convert.millis_to_time @gapi.last_modified_time
end

#nameString?

A descriptive name for the dataset.



168
169
170
171
# File 'lib/google/cloud/bigquery/dataset.rb', line 168

def name
  return nil if reference?
  @gapi.friendly_name
end

#name=(new_name) ⇒ Object

Updates the descriptive name for the dataset.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.



185
186
187
188
189
# File 'lib/google/cloud/bigquery/dataset.rb', line 185

def name= new_name
  reload! unless resource_full?
  @gapi.update! friendly_name: new_name
  patch_gapi! :friendly_name
end

#project_idString

The ID of the project containing this dataset.



145
146
147
148
# File 'lib/google/cloud/bigquery/dataset.rb', line 145

def project_id
  return reference.project_id if reference?
  @gapi.dataset_reference.project_id
end

#query(query, params: nil, types: nil, external: nil, max: nil, cache: true, standard_sql: nil, legacy_sql: nil, session_id: nil, format_options_use_int64_timestamp: true, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::Data

Queries data and waits for the results. In this method, a QueryJob is created and its results are saved to a temporary table, then read from the table. Timeouts and transient errors are generally handled as needed to complete the query. When used for executing DDL/DML statements, this method does not return row data.

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

The geographic location for the job ("US", "EU", etc.) can be set via QueryJob::Updater#location= in a block passed to this method. If the dataset is a full resource representation (see #resource_full?), the location of the job will be automatically set to the location of the dataset.

Examples:

Query using standard SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "SELECT name FROM my_table"

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Query using legacy SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "SELECT name FROM my_table",
                     legacy_sql: true

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Query using positional query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "SELECT name FROM my_table WHERE id = ?",
                     params: [1]

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Query using named query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "SELECT name FROM my_table WHERE id = @id",
                     params: { id: 1 }

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Query using named query parameters with types:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "SELECT name FROM my_table WHERE id IN UNNEST(@ids)",
                     params: { ids: [] },
                     types: { ids: [:INT64] }

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Execute a DDL statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "CREATE TABLE my_table (x INT64)"

table_ref = data.ddl_target_table # Or ddl_target_routine for CREATE/DROP FUNCTION/PROCEDURE

Execute a DML statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "UPDATE my_table SET x = x + 1 WHERE x IS NOT NULL"

puts data.num_dml_affected_rows

Run query in a session:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "CREATE TEMPORARY TABLE temptable AS SELECT 17 as foo", create_session: true

job.wait_until_done!

session_id = job.session_id
data = dataset.query "SELECT * FROM temptable", session_id: session_id

Query using external data source, set destination:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

csv_url = "gs://bucket/path/to/data.csv"
csv_table = dataset.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

data = dataset.query "SELECT * FROM my_ext_table" do |query|
  query.external = { my_ext_table: csv_table }
  query.table = dataset.table "my_table", skip_lookup: true
end

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Yields:

  • (job)

    a job configuration object

Yield Parameters:

See Also:



1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
# File 'lib/google/cloud/bigquery/dataset.rb', line 1960

def query query,
          params: nil,
          types: nil,
          external: nil,
          max: nil,
          cache: true,
          standard_sql: nil,
          legacy_sql: nil,
          session_id: nil,
          format_options_use_int64_timestamp: true,
          reservation: nil,
          &block
  job = query_job query,
                  params: params,
                  types: types,
                  external: external,
                  cache: cache,
                  standard_sql: standard_sql,
                  legacy_sql: legacy_sql,
                  session_id: session_id,
                  reservation: reservation,
                  &block
  job.wait_until_done!
  ensure_job_succeeded! job

  job.data max: max, format_options_use_int64_timestamp: format_options_use_int64_timestamp
end

#query_job(query, params: nil, types: nil, external: nil, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, dryrun: nil, standard_sql: nil, legacy_sql: nil, large_results: nil, flatten: nil, maximum_billing_tier: nil, maximum_bytes_billed: nil, job_id: nil, prefix: nil, labels: nil, udfs: nil, create_session: nil, session_id: nil, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::QueryJob

Queries data by creating a query job.

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

The geographic location for the job ("US", "EU", etc.) can be set via QueryJob::Updater#location= in a block passed to this method. If the dataset is a full resource representation (see #resource_full?), the location of the job will be automatically set to the location of the dataset.

Examples:

Query using standard SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "SELECT name FROM my_table"

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using legacy SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "SELECT name FROM my_table",
                        legacy_sql: true

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using positional query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "SELECT name FROM my_table WHERE id = ?",
                        params: [1]

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using named query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "SELECT name FROM my_table WHERE id = @id",
                        params: { id: 1 }

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using named query parameters with types:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "SELECT name FROM my_table WHERE id IN UNNEST(@ids)",
                        params: { ids: [] },
                        types: { ids: [:INT64] }

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Execute a DDL statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "CREATE TABLE my_table (x INT64)"

job.wait_until_done!
if !job.failed?
  table_ref = job.ddl_target_table # Or ddl_target_routine for CREATE/DROP FUNCTION/PROCEDURE
end

Execute a DML statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "UPDATE my_table SET x = x + 1 WHERE x IS NOT NULL"

job.wait_until_done!
if !job.failed?
  puts job.num_dml_affected_rows
end

Run query in a session:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "CREATE TEMPORARY TABLE temptable AS SELECT 17 as foo", create_session: true

job.wait_until_done!

session_id = job.session_id
data = dataset.query "SELECT * FROM temptable", session_id: session_id

Query using external data source, set destination:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

csv_url = "gs://bucket/path/to/data.csv"
csv_table = dataset.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

job = dataset.query_job "SELECT * FROM my_ext_table" do |query|
  query.external = { my_ext_table: csv_table }
  query.table = dataset.table "my_table", skip_lookup: true
end

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Yields:

  • (job)

    a job configuration object

Yield Parameters:



1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
# File 'lib/google/cloud/bigquery/dataset.rb', line 1634

def query_job query,
              params: nil,
              types: nil,
              external: nil,
              priority: "INTERACTIVE",
              cache: true,
              table: nil,
              create: nil,
              write: nil,
              dryrun: nil,
              standard_sql: nil,
              legacy_sql: nil,
              large_results: nil,
              flatten: nil,
              maximum_billing_tier: nil,
              maximum_bytes_billed: nil,
              job_id: nil,
              prefix: nil,
              labels: nil,
              udfs: nil,
              create_session: nil,
              session_id: nil,
              reservation: nil
  ensure_service!
  options = {
    params: params,
    types: types,
    external: external,
    priority: priority,
    cache: cache,
    table: table,
    create: create,
    write: write,
    dryrun: dryrun,
    standard_sql: standard_sql,
    legacy_sql: legacy_sql,
    large_results: large_results,
    flatten: flatten,
    maximum_billing_tier: maximum_billing_tier,
    maximum_bytes_billed: maximum_bytes_billed,
    job_id: job_id,
    prefix: prefix,
    labels: labels,
    udfs: udfs,
    create_session: create_session,
    session_id: session_id,
    reservation: reservation
  }

  updater = QueryJob::Updater.from_options service, query, options
  updater.dataset = self
  updater.location = location if location # may be dataset reference

  yield updater if block_given?

  gapi = service.query_job updater.to_gapi
  Job.from_gapi gapi, service
end

#reference?Boolean

Whether the dataset was created without retrieving the resource representation from the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true

dataset.reference? # true
dataset.reload!
dataset.reference? # false


2718
2719
2720
# File 'lib/google/cloud/bigquery/dataset.rb', line 2718

def reference?
  @gapi.nil?
end

#reload!Google::Cloud::Bigquery::Dataset Also known as: refresh!

Reloads the dataset with current data from the BigQuery service.

Examples:

Skip retrieving the dataset from the service, then load it:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true
dataset.reload!


2661
2662
2663
2664
2665
2666
2667
2668
# File 'lib/google/cloud/bigquery/dataset.rb', line 2661

def reload!
  ensure_service!
  @gapi = service.get_project_dataset project_id, dataset_id, access_policy_version: @access_policy_version,
    dataset_view: @dataset_view
  @reference = nil
  @exists = nil
  self
end

#resource?Boolean

Whether the dataset was created with a resource representation from the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true

dataset.resource? # false
dataset.reload!
dataset.resource? # true


2740
2741
2742
# File 'lib/google/cloud/bigquery/dataset.rb', line 2740

def resource?
  !@gapi.nil?
end

#resource_full?Boolean

Whether the dataset was created with a full resource representation from the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"

dataset.resource_full? # true


2787
2788
2789
# File 'lib/google/cloud/bigquery/dataset.rb', line 2787

def resource_full?
  @gapi.is_a? Google::Apis::BigqueryV2::Dataset
end

#resource_partial?Boolean

Whether the dataset was created with a partial resource representation from the BigQuery service by retrieval through Project#datasets. See Datasets: list response for the contents of the partial representation. Accessing any attribute outside of the partial representation will result in loading the full representation.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.datasets.first

dataset.resource_partial? # true
dataset.description # Loads the full resource.
dataset.resource_partial? # false


2767
2768
2769
# File 'lib/google/cloud/bigquery/dataset.rb', line 2767

def resource_partial?
  @gapi.is_a? Google::Apis::BigqueryV2::DatasetList::Dataset
end

#routine(routine_id, skip_lookup: nil) ⇒ Google::Cloud::Bigquery::Routine?

Retrieves an existing routine by ID.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

routine = dataset.routine "my_routine"
puts routine.routine_id

Avoid retrieving the routine resource with skip_lookup:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"

routine = dataset.routine "my_routine", skip_lookup: true


1232
1233
1234
1235
1236
1237
1238
1239
# File 'lib/google/cloud/bigquery/dataset.rb', line 1232

def routine routine_id, skip_lookup: nil
  ensure_service!
  return Routine.new_reference project_id, dataset_id, routine_id, service if skip_lookup
  gapi = service.get_routine dataset_id, routine_id
  Routine.from_gapi gapi, service
rescue Google::Cloud::NotFoundError
  nil
end

#routines(token: nil, max: nil, filter: nil) ⇒ Array<Google::Cloud::Bigquery::Routine>

Retrieves the list of routines belonging to the dataset.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

routines = dataset.routines
routines.each do |routine|
  puts routine.routine_id
end

Retrieve all routines: (See Routine::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

routines = dataset.routines
routines.all do |routine|
  puts routine.routine_id
end


1277
1278
1279
1280
1281
# File 'lib/google/cloud/bigquery/dataset.rb', line 1277

def routines token: nil, max: nil, filter: nil
  ensure_service!
  gapi = service.list_routines dataset_id, token: token, max: max, filter: filter
  Routine::List.from_gapi gapi, service, dataset_id, max, filter: filter
end

#storage_billing_modelString?

Gets the Storage Billing Model for the dataset.

Examples:


require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

storage_billing_model = dataset.storage_billing_model

See Also:



522
523
524
525
526
# File 'lib/google/cloud/bigquery/dataset.rb', line 522

def storage_billing_model
  return nil if reference?
  ensure_full_data!
  @gapi.storage_billing_model
end

#storage_billing_model=(value) ⇒ Object

Sets the Storage Billing Model for the dataset.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
dataset.storage_billing_model = "LOGICAL"

See Also:



545
546
547
548
549
# File 'lib/google/cloud/bigquery/dataset.rb', line 545

def storage_billing_model= value
  ensure_full_data!
  @gapi.storage_billing_model = value
  patch_gapi! :storage_billing_model
end

#table(table_id, skip_lookup: nil, view: nil) ⇒ Google::Cloud::Bigquery::Table?

Retrieves an existing table by ID.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.table "my_table"
puts table.name

Avoid retrieving the table resource with skip_lookup:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"

table = dataset.table "my_table", skip_lookup: true

Avoid retrieving transient stats of the table with view:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"

table = dataset.table "my_table", view: "basic"


985
986
987
988
989
990
991
992
# File 'lib/google/cloud/bigquery/dataset.rb', line 985

def table table_id, skip_lookup: nil, view: nil
  ensure_service!
  return Table.new_reference project_id, dataset_id, table_id, service if skip_lookup
  gapi = service.get_project_table project_id, dataset_id, table_id, metadata_view: view
  Table.from_gapi gapi, service, metadata_view: view
rescue Google::Cloud::NotFoundError
  nil
end

#tables(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Table>

Retrieves the list of tables belonging to the dataset.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

tables = dataset.tables
tables.each do |table|
  puts table.name
end

Retrieve all tables: (See Table::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

tables = dataset.tables
tables.all do |table|
  puts table.name
end


1028
1029
1030
1031
1032
# File 'lib/google/cloud/bigquery/dataset.rb', line 1028

def tables token: nil, max: nil
  ensure_service!
  gapi = service.list_tables dataset_id, token: token, max: max
  Table::List.from_gapi gapi, service, dataset_id, max
end

#tagsGoogle::Cloud::Bigquery::Dataset::Tag

Retrieves the tags associated with this dataset. Tag keys are globally unique, and managed via the resource manager API.

for more information.



614
615
616
617
618
# File 'lib/google/cloud/bigquery/dataset.rb', line 614

def tags
  ensure_full_data!
  return nil if @gapi.tags.nil?
  @gapi.tags.map { |gapi| Tag.from_gapi gapi }
end