Class: Google::Cloud::Bigquery::QueryJob

Inherits:

Object
Job
Google::Cloud::Bigquery::QueryJob

Defined in:: lib/google/cloud/bigquery/query_job.rb

Overview

QueryJob

A Job subclass representing a query operation that may be performed on a Table. A QueryJob instance is created when you call Project#query_job, Dataset#query_job.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT COUNT(word) as count FROM " \
                         "`bigquery-public-data.samples.shakespeare`"

job.wait_until_done!

if job.failed?
  puts job.error
else
  puts job.data.first
end

See Also:

Direct Known Subclasses

Updater

Defined Under Namespace

Classes: Stage, Step, Updater

Attributes collapse

#clustering? ⇒ Boolean^?
Checks if the destination table will be clustered.
#clustering_fields ⇒ Array<String>^?
One or more fields on which the destination table should be clustered.
#data(token: nil, max: nil, start: nil) ⇒ Google::Cloud::Bigquery::Data (also: #query_results)
Retrieves the query results for the job.
#encryption ⇒ Google::Cloud::BigQuery::EncryptionConfiguration
The encryption configuration of the destination table.
#time_partitioning? ⇒ Boolean^?
Checks if the destination table will be time-partitioned.
#time_partitioning_expiration ⇒ Integer^?
The expiration for the destination table partitions, if any, in seconds.
#time_partitioning_field ⇒ String^?
The field on which the destination table will be partitioned, if any.
#time_partitioning_require_filter? ⇒ Boolean
If set to true, queries over the destination table will require a partition filter that can be used for partition elimination to be specified.
#time_partitioning_type ⇒ String^?
The period for which the destination table will be partitioned, if any.
#wait_until_done! ⇒ Object
Refreshes the job until the job is DONE.

Instance Method Summary collapse

#batch? ⇒ Boolean
Checks if the priority for the query is BATCH.
#bytes_processed ⇒ Integer^?
The number of bytes processed by the query.
#cache? ⇒ Boolean
Checks if the query job looks for an existing result in the query cache.
#cache_hit? ⇒ Boolean
Checks if the query results are from the query cache.
#ddl? ⇒ Boolean
Whether the query is a DDL statement.
#ddl_operation_performed ⇒ String^?
The DDL operation performed, possibly dependent on the pre-existence of the DDL target.
#ddl_target_table ⇒ Google::Cloud::Bigquery::Table^?
The DDL target table, in reference state.
#destination ⇒ Table
The table in which the query results are stored.
#dml? ⇒ Boolean
Whether the query is a DML statement.
#dryrun? ⇒ Boolean (also: #dryrun, #dry_run, #dry_run?)
If set, don't actually run this job.
#flatten? ⇒ Boolean
Checks if the query job flattens nested and repeated fields in the query results.
#interactive? ⇒ Boolean
Checks if the priority for the query is INTERACTIVE.
#large_results? ⇒ Boolean
Checks if the the query job allows arbitrarily large results at a slight cost to performance.
#legacy_sql? ⇒ Boolean
Checks if the query job is using legacy sql.
#maximum_billing_tier ⇒ Integer^?
Limits the billing tier for this job.
#maximum_bytes_billed ⇒ Integer^?
Limits the bytes billed for this job.
#num_dml_affected_rows ⇒ Integer^?
The number of rows affected by a DML statement.
#query_plan ⇒ Array<Google::Cloud::Bigquery::QueryJob::Stage>^?
Describes the execution plan for the query.
#standard_sql? ⇒ Boolean
Checks if the query job is using standard sql.
#statement_type ⇒ String^?
The type of query statement, if valid.
#udfs ⇒ Array<String>
The user-defined function resources used in the query.

Methods inherited from Job

#cancel, #configuration, #created_at, #done?, #ended_at, #error, #errors, #failed?, #job_id, #labels, #location, #pending?, #project_id, #reload!, #rerun!, #running?, #started_at, #state, #statistics, #status, #user_email

Instance Method Details

#batch? ⇒ `Boolean`

Checks if the priority for the query is BATCH.

Returns:

(Boolean) —
true when the priority is BATCH, false otherwise.

# File 'lib/google/cloud/bigquery/query_job.rb', line 58

def batch?
  val = @gapi.configuration.query.priority
  val == "BATCH"
end

#bytes_processed ⇒ `Integer`^?

The number of bytes processed by the query.

Returns:

(Integer, nil) —
Total bytes processed for the job.

# File 'lib/google/cloud/bigquery/query_job.rb', line 176

def bytes_processed
  Integer @gapi.statistics.query.total_bytes_processed
rescue StandardError
  nil
end

#cache? ⇒ `Boolean`

Checks if the query job looks for an existing result in the query cache. For more information, see Query Caching.

Returns:

(Boolean) —
true when the query cache will be used, false otherwise.

# File 'lib/google/cloud/bigquery/query_job.rb', line 96

def cache?
  val = @gapi.configuration.query.use_query_cache
  return false if val.nil?
  val
end

#cache_hit? ⇒ `Boolean`

Checks if the query results are from the query cache.

Returns:

(Boolean) —
true when the job statistics indicate a cache hit, false otherwise.

# File 'lib/google/cloud/bigquery/query_job.rb', line 166

def cache_hit?
  return false unless @gapi.statistics.query
  @gapi.statistics.query.cache_hit
end

#clustering? ⇒ `Boolean`^?

Checks if the destination table will be clustered.

Returns:

(Boolean, nil) —
true when the table will be clustered, or false otherwise.

See Also:

Introduction to Clustered Tables



490
491
492

# File 'lib/google/cloud/bigquery/query_job.rb', line 490

def clustering?
  !@gapi.configuration.query.clustering.nil?
end

#clustering_fields ⇒ `Array<String>`^?

One or more fields on which the destination table should be clustered. Must be specified with time-based partitioning, data in the table will be first partitioned and subsequently clustered. The order of the returned fields determines the sort order of the data.

See Google::Cloud::Bigquery::QueryJob::Updater#clustering_fields=.

Returns:

(Array<String>, nil) —
The clustering fields, or nil if the destination table will not be clustered.

See Also:



514
515
516

# File 'lib/google/cloud/bigquery/query_job.rb', line 514

def clustering_fields
  @gapi.configuration.query.clustering.fields if clustering?
end

#data(token: nil, max: nil, start: nil) ⇒ `Google::Cloud::Bigquery::Data` Also known as: query_results

Retrieves the query results for the job.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

sql = "SELECT word FROM `bigquery-public-data.samples.shakespeare`"
job = bigquery.query_job sql

job.wait_until_done!
data = job.data
data.each do |row|
  puts row[:word]
end
data = data.next if data.next?

Parameters:

token (String) (defaults to: nil) —
Page token, returned by a previous call, identifying the result set.
max (Integer) (defaults to: nil) —
Maximum number of results to return.
start (Integer) (defaults to: nil) —
Zero-based index of the starting row to read.

Returns:

(Google::Cloud::Bigquery::Data) —
An object providing access to data read from the destination table for the job.

# File 'lib/google/cloud/bigquery/query_job.rb', line 574

def data token: nil, max: nil, start: nil
  return nil unless done?
  if dryrun?
    return Data.from_gapi_json({ rows: [] }, nil, @gapi, service)
  end
  if ddl? || dml?
    data_hash = { totalRows: nil, rows: [] }
    return Data.from_gapi_json data_hash, nil, @gapi, service
  end
  ensure_schema!

  options = { token: token, max: max, start: start }
  data_hash = service.list_tabledata \
    destination_table_dataset_id,
    destination_table_table_id,
    options
  Data.from_gapi_json data_hash, destination_table_gapi, @gapi, service
end

#ddl? ⇒ `Boolean`

Whether the query is a DDL statement.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
query_job = bigquery.query_job "CREATE TABLE my_table (x INT64)"

query_job.statement_type #=> "CREATE_TABLE"
query_job.ddl? #=> true

Returns:

(Boolean)

See Also:

Using Data Definition Language Statements

# File 'lib/google/cloud/bigquery/query_job.rb', line 263

def ddl?
  %w[CREATE_MODEL CREATE_TABLE CREATE_TABLE_AS_SELECT CREATE_VIEW \
     DROP_MODEL DROP_TABLE DROP_VIEW].include? statement_type
end

#ddl_operation_performed ⇒ `String`^?

The DDL operation performed, possibly dependent on the pre-existence of the DDL target. (See #ddl_target_table.) Possible values (new values might be added in the future):

"CREATE": The query created the DDL target.
"SKIP": No-op. Example cases: the query is CREATE TABLE IF NOT EXISTS while the table already exists, or the query is DROP TABLE IF EXISTS while the table does not exist.
"REPLACE": The query replaced the DDL target. Example case: the query is CREATE OR REPLACE TABLE, and the table already exists.
"DROP": The query deleted the DDL target.

Returns:

(String, nil) —
The DDL operation performed.

# File 'lib/google/cloud/bigquery/query_job.rb', line 306

def ddl_operation_performed
  return nil unless @gapi.statistics.query
  @gapi.statistics.query.ddl_operation_performed
end

#ddl_target_table ⇒ `Google::Cloud::Bigquery::Table`^?

The DDL target table, in reference state. (See Table#reference?.) Present only for CREATE/DROP TABLE/VIEW queries. (See #statement_type.)

Returns:

(Google::Cloud::Bigquery::Table, nil) —
The DDL target table, in reference state.

# File 'lib/google/cloud/bigquery/query_job.rb', line 319

def ddl_target_table
  return nil unless @gapi.statistics.query
  ensure_service!
  table = @gapi.statistics.query.ddl_target_table
  return nil unless table
  Google::Cloud::Bigquery::Table.new_reference_from_gapi table, service
end

#destination ⇒ `Table`

The table in which the query results are stored.

Returns:

(Table) —
A table instance.

# File 'lib/google/cloud/bigquery/query_job.rb', line 344

def destination
  table = @gapi.configuration.query.destination_table
  return nil unless table
  retrieve_table table.project_id,
                 table.dataset_id,
                 table.table_id
end

#dml? ⇒ `Boolean`

Whether the query is a DML statement.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
query_job = bigquery.query_job "UPDATE my_table " \
                               "SET x = x + 1 " \
                               "WHERE x IS NOT NULL"

query_job.statement_type #=> "UPDATE"
query_job.dml? #=> true

Returns:

(Boolean)

See Also:

Data Manipulation Language Syntax



287
288
289

# File 'lib/google/cloud/bigquery/query_job.rb', line 287

def dml?
  %w[INSERT UPDATE MERGE DELETE].include? statement_type
end

#dryrun? ⇒ `Boolean` Also known as: dryrun, dry_run, dry_run?

If set, don't actually run this job. A valid query will return a mostly empty response with some processing statistics, while an invalid query will return the same error it would if it wasn't a dry run.

Returns:

(Boolean) —
true when the dry run flag is set for the query job, false otherwise.



111
112
113

# File 'lib/google/cloud/bigquery/query_job.rb', line 111

def dryrun?
  @gapi.configuration.dry_run
end

#encryption ⇒ `Google::Cloud::BigQuery::EncryptionConfiguration`

The encryption configuration of the destination table.

Returns:

(Google::Cloud::BigQuery::EncryptionConfiguration) —
Custom encryption configuration (e.g., Cloud KMS keys).

# File 'lib/google/cloud/bigquery/query_job.rb', line 398

def encryption
  EncryptionConfiguration.from_gapi(
    @gapi.configuration.query.destination_encryption_configuration
  )
end

#flatten? ⇒ `Boolean`

Checks if the query job flattens nested and repeated fields in the query results. The default is true. If the value is false,

large_results? should return `true`.

Returns:

(Boolean) —
true when the job flattens results, false otherwise.

# File 'lib/google/cloud/bigquery/query_job.rb', line 126

def flatten?
  val = @gapi.configuration.query.flatten_results
  return true if val.nil?
  val
end

#interactive? ⇒ `Boolean`

Checks if the priority for the query is INTERACTIVE.

Returns:

(Boolean) —
true when the priority is INTERACTIVE, false otherwise.

# File 'lib/google/cloud/bigquery/query_job.rb', line 69

def interactive?
  val = @gapi.configuration.query.priority
  return true if val.nil?
  val == "INTERACTIVE"
end

#large_results? ⇒ `Boolean`

Checks if the the query job allows arbitrarily large results at a slight cost to performance.

Returns:

(Boolean) —
true when large results are allowed, false otherwise.

# File 'lib/google/cloud/bigquery/query_job.rb', line 82

def large_results?
  val = @gapi.configuration.query.allow_large_results
  return false if val.nil?
  val
end

#legacy_sql? ⇒ `Boolean`

Checks if the query job is using legacy sql.

Returns:

(Boolean) —
true when legacy sql is used, false otherwise.

# File 'lib/google/cloud/bigquery/query_job.rb', line 357

def legacy_sql?
  val = @gapi.configuration.query.use_legacy_sql
  return true if val.nil?
  val
end

#maximum_billing_tier ⇒ `Integer`^?

Limits the billing tier for this job. Queries that have resource usage beyond this tier will raise (without incurring a charge). If unspecified, this will be set to your project default. For more information, see High-Compute queries.

Returns:

(Integer, nil) —
The tier number, or nil for the project default.



142
143
144

# File 'lib/google/cloud/bigquery/query_job.rb', line 142

def maximum_billing_tier
  @gapi.configuration.query.maximum_billing_tier
end

#maximum_bytes_billed ⇒ `Integer`^?

Limits the bytes billed for this job. Queries that will have bytes billed beyond this limit will raise (without incurring a charge). If nil, this will be set to your project default.

Returns:

(Integer, nil) —
The number of bytes, or nil for the project default.

# File 'lib/google/cloud/bigquery/query_job.rb', line 154

def maximum_bytes_billed
  Integer @gapi.configuration.query.maximum_bytes_billed
rescue StandardError
  nil
end

#num_dml_affected_rows ⇒ `Integer`^?

The number of rows affected by a DML statement. Present only for DML statements INSERT, UPDATE or DELETE. (See #statement_type.)

Returns:

(Integer, nil) —
The number of rows affected by a DML statement, or nil if the query is not a DML statement.

# File 'lib/google/cloud/bigquery/query_job.rb', line 334

def num_dml_affected_rows
  return nil unless @gapi.statistics.query
  @gapi.statistics.query.num_dml_affected_rows
end

#query_plan ⇒ `Array<Google::Cloud::Bigquery::QueryJob::Stage>`^?

Describes the execution plan for the query.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

sql = "SELECT word FROM `bigquery-public-data.samples.shakespeare`"
job = bigquery.query_job sql

job.wait_until_done!

stages = job.query_plan
stages.each do |stage|
  puts stage.name
  stage.steps.each do |step|
    puts step.kind
    puts step.substeps.inspect
  end
end

Returns:

(Array<Google::Cloud::Bigquery::QueryJob::Stage>, nil) —
An array containing the stages of the execution plan.

# File 'lib/google/cloud/bigquery/query_job.rb', line 207

def query_plan
  return nil unless @gapi.statistics.query &&
                    @gapi.statistics.query.query_plan
  Array(@gapi.statistics.query.query_plan).map do |stage|
    Stage.from_gapi stage
  end
end

#standard_sql? ⇒ `Boolean`

Checks if the query job is using standard sql.

Returns:

(Boolean) —
true when standard sql is used, false otherwise.



368
369
370

# File 'lib/google/cloud/bigquery/query_job.rb', line 368

def standard_sql?
  !legacy_sql?
end

#statement_type ⇒ `String`^?

The type of query statement, if valid. Possible values (new values might be added in the future):

"CREATE_MODEL": DDL statement, see Using Data Definition Language Statements
"CREATE_TABLE": DDL statement, see Using Data Definition Language Statements
"CREATE_TABLE_AS_SELECT": DDL statement, see Using Data Definition Language Statements
"CREATE_VIEW": DDL statement, see Using Data Definition Language Statements
"DELETE": DML statement, see Data Manipulation Language Syntax
"DROP_MODEL": DDL statement, see Using Data Definition Language Statements
"DROP_TABLE": DDL statement, see Using Data Definition Language Statements
"DROP_VIEW": DDL statement, see Using Data Definition Language Statements
"INSERT": DML statement, see Data Manipulation Language Syntax
"MERGE": DML statement, see Data Manipulation Language Syntax
"SELECT": SQL query, see Standard SQL Query Syntax
"UPDATE": DML statement, see Data Manipulation Language Syntax

Returns:

(String, nil) —
The type of query statement.

# File 'lib/google/cloud/bigquery/query_job.rb', line 241

def statement_type
  return nil unless @gapi.statistics.query
  @gapi.statistics.query.statement_type
end

#time_partitioning? ⇒ `Boolean`^?

Checks if the destination table will be time-partitioned. See Partitioned Tables.

Returns:

(Boolean, nil) —
true when the table will be time-partitioned, or false otherwise.



413
414
415

# File 'lib/google/cloud/bigquery/query_job.rb', line 413

def time_partitioning?
  !@gapi.configuration.query.time_partitioning.nil?
end

#time_partitioning_expiration ⇒ `Integer`^?

The expiration for the destination table partitions, if any, in seconds. See Partitioned Tables.

Returns:

(Integer, nil) —
The expiration time, in seconds, for data in partitions, or nil if not present.

# File 'lib/google/cloud/bigquery/query_job.rb', line 457

def time_partitioning_expiration
  tp = @gapi.configuration.query.time_partitioning
  tp.expiration_ms / 1_000 if tp && !tp.expiration_ms.nil?
end

#time_partitioning_field ⇒ `String`^?

The field on which the destination table will be partitioned, if any. If not set, the destination table will be partitioned by pseudo column _PARTITIONTIME; if set, the table will be partitioned by this field. See Partitioned Tables.

Returns:

(String, nil) —
The partition field, if a field was configured. nil if not partitioned or not set (partitioned by pseudo column '_PARTITIONTIME').

# File 'lib/google/cloud/bigquery/query_job.rb', line 442

def time_partitioning_field
  return nil unless time_partitioning?
  @gapi.configuration.query.time_partitioning.field
end

#time_partitioning_require_filter? ⇒ `Boolean`

If set to true, queries over the destination table will require a partition filter that can be used for partition elimination to be specified. See Partitioned Tables.

Returns:

(Boolean) —
true when a partition filter will be required, or false otherwise.

# File 'lib/google/cloud/bigquery/query_job.rb', line 473

def time_partitioning_require_filter?
  tp = @gapi.configuration.query.time_partitioning
  return false if tp.nil? || tp.require_partition_filter.nil?
  tp.require_partition_filter
end

#time_partitioning_type ⇒ `String`^?

The period for which the destination table will be partitioned, if any. See Partitioned Tables.

Returns:

(String, nil) —
The partition type. Currently the only supported value is "DAY", or nil if not present.



426
427
428

# File 'lib/google/cloud/bigquery/query_job.rb', line 426

def time_partitioning_type
  @gapi.configuration.query.time_partitioning.type if time_partitioning?
end

#udfs ⇒ `Array<String>`

The user-defined function resources used in the query. May be either a code resource to load from a Google Cloud Storage URI (gs://bucket/path), or an inline resource that contains code for a user-defined function (UDF). Providing an inline code resource is equivalent to providing a URI for a file containing the same code. See User-Defined Functions.

Returns:

(Array<String>) —
An array containing Google Cloud Storage URIs and/or inline source code.

# File 'lib/google/cloud/bigquery/query_job.rb', line 383

def udfs
  udfs_gapi = @gapi.configuration.query.user_defined_function_resources
  return nil unless udfs_gapi
  Array(udfs_gapi).map do |udf|
    udf.inline_code || udf.resource_uri
  end
end

#wait_until_done! ⇒ `Object`

Refreshes the job until the job is DONE. The delay between refreshes will incrementally increase.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

sql = "SELECT word FROM `bigquery-public-data.samples.shakespeare`"
job = bigquery.query_job sql

job.wait_until_done!
job.done? #=> true

# File 'lib/google/cloud/bigquery/query_job.rb', line 533

def wait_until_done!
  return if done?

  ensure_service!
  loop do
    query_results_gapi = service.job_query_results \
      job_id, location: location, max: 0
    if query_results_gapi.job_complete
      @destination_schema_gapi = query_results_gapi.schema
      break
    end
  end
  reload!
end

Class: Google::Cloud::Bigquery::QueryJob

Overview

QueryJob

Direct Known Subclasses

Defined Under Namespace

Attributes collapse

Instance Method Summary collapse

Methods inherited from Job

Instance Method Details

#batch? ⇒ Boolean

#bytes_processed ⇒ Integer?

#cache? ⇒ Boolean

#cache_hit? ⇒ Boolean

#clustering? ⇒ Boolean?

#clustering_fields ⇒ Array<String>?

#data(token: nil, max: nil, start: nil) ⇒ Google::Cloud::Bigquery::Data Also known as: query_results

#ddl? ⇒ Boolean

#ddl_operation_performed ⇒ String?

#ddl_target_table ⇒ Google::Cloud::Bigquery::Table?

#destination ⇒ Table

#dml? ⇒ Boolean

#dryrun? ⇒ Boolean Also known as: dryrun, dry_run, dry_run?

#encryption ⇒ Google::Cloud::BigQuery::EncryptionConfiguration

#flatten? ⇒ Boolean

large_results? should return true.

#interactive? ⇒ Boolean

#large_results? ⇒ Boolean

#legacy_sql? ⇒ Boolean

#maximum_billing_tier ⇒ Integer?

#maximum_bytes_billed ⇒ Integer?

#num_dml_affected_rows ⇒ Integer?

#query_plan ⇒ Array<Google::Cloud::Bigquery::QueryJob::Stage>?

#standard_sql? ⇒ Boolean

#statement_type ⇒ String?

#time_partitioning? ⇒ Boolean?

#time_partitioning_expiration ⇒ Integer?

#time_partitioning_field ⇒ String?

#time_partitioning_require_filter? ⇒ Boolean

#time_partitioning_type ⇒ String?

#udfs ⇒ Array<String>

#wait_until_done! ⇒ Object

#batch? ⇒ `Boolean`

#bytes_processed ⇒ `Integer`^?

#cache? ⇒ `Boolean`

#cache_hit? ⇒ `Boolean`

#clustering? ⇒ `Boolean`^?

#clustering_fields ⇒ `Array<String>`^?

#data(token: nil, max: nil, start: nil) ⇒ `Google::Cloud::Bigquery::Data` Also known as: query_results

#ddl? ⇒ `Boolean`

#ddl_operation_performed ⇒ `String`^?

#ddl_target_table ⇒ `Google::Cloud::Bigquery::Table`^?

#destination ⇒ `Table`

#dml? ⇒ `Boolean`

#dryrun? ⇒ `Boolean` Also known as: dryrun, dry_run, dry_run?

#encryption ⇒ `Google::Cloud::BigQuery::EncryptionConfiguration`

#flatten? ⇒ `Boolean`

large_results? should return `true`.

#interactive? ⇒ `Boolean`

#large_results? ⇒ `Boolean`

#legacy_sql? ⇒ `Boolean`

#maximum_billing_tier ⇒ `Integer`^?

#maximum_bytes_billed ⇒ `Integer`^?

#num_dml_affected_rows ⇒ `Integer`^?

#query_plan ⇒ `Array<Google::Cloud::Bigquery::QueryJob::Stage>`^?

#standard_sql? ⇒ `Boolean`

#statement_type ⇒ `String`^?

#time_partitioning? ⇒ `Boolean`^?

#time_partitioning_expiration ⇒ `Integer`^?

#time_partitioning_field ⇒ `String`^?

#time_partitioning_require_filter? ⇒ `Boolean`

#time_partitioning_type ⇒ `String`^?

#udfs ⇒ `Array<String>`

#wait_until_done! ⇒ `Object`