Class: Google::Cloud::Bigquery::ExtractJob

Inherits:

Job

Object
Job
Google::Cloud::Bigquery::ExtractJob

show all

Defined in:: lib/google/cloud/bigquery/extract_job.rb

Overview

ExtractJob

A Job subclass representing an export operation that may be performed on a Table or Model. A ExtractJob instance is returned when you call Project#extract_job, Table#extract_job or Model#extract_job.

Examples:

Export table data

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

extract_job = table.extract_job "gs://my-bucket/file-name.json",
                                format: "json"
extract_job.wait_until_done!
extract_job.done? #=> true

Export a model

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

extract_job = model.extract_job "gs://my-bucket/#{model.model_id}"

extract_job.wait_until_done!
extract_job.done? #=> true

Direct Known Subclasses

Updater

Defined Under Namespace

Classes: Updater

Instance Method Summary collapse

#avro? ⇒ Boolean
Checks if the destination format for the table data is Avro.
#compression? ⇒ Boolean
Checks if the export operation compresses the data using gzip.
#csv? ⇒ Boolean
Checks if the destination format for the table data is CSV.
#delimiter ⇒ String^?
The character or symbol the operation uses to delimit fields in the exported data.
#destinations ⇒ Object
The URI or URIs representing the Google Cloud Storage files to which the data is exported.
#destinations_counts ⇒ Hash<String, Integer>
A hash containing the URI or URI pattern specified in #destinations mapped to the counts of files per destination.
#destinations_file_counts ⇒ Array<Integer>
The number of files per destination URI or URI pattern specified in #destinations.
#json? ⇒ Boolean
Checks if the destination format for the table data is newline-delimited JSON.
#ml_tf_saved_model? ⇒ Boolean
Checks if the destination format for the model is TensorFlow SavedModel.
#ml_xgboost_booster? ⇒ Boolean
Checks if the destination format for the model is XGBoost.
#model? ⇒ Boolean
Whether the source of the export job is a model.
#print_header? ⇒ Boolean
Checks if the exported data contains a header row.
#source ⇒ Table, ...
The table or model which is exported.
#table? ⇒ Boolean
Whether the source of the export job is a table.
#use_avro_logical_types? ⇒ Boolean
If #avro? (#format is set to "AVRO"), this flag indicates whether to enable extracting applicable column types (such as TIMESTAMP) to their corresponding AVRO logical types (timestamp-micros), instead of only using their raw types (avro-long).

Methods inherited from Job

#cancel, #configuration, #created_at, #done?, #ended_at, #error, #errors, #failed?, #job_id, #labels, #location, #num_child_jobs, #parent_job_id, #pending?, #project_id, #reload!, #rerun!, #running?, #script_statistics, #started_at, #state, #statistics, #status, #user_email, #wait_until_done!

Instance Method Details

#avro? ⇒ `Boolean`

Checks if the destination format for the table data is Avro. The default is false. Not applicable when extracting models.

Returns:

(Boolean) —
true when AVRO, false if not AVRO or not a table extraction.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 147

def avro?
  return false unless table?
  val = @gapi.configuration.extract.destination_format
  val == "AVRO"
end

#compression? ⇒ `Boolean`

Checks if the export operation compresses the data using gzip. The default is false. Not applicable when extracting models.

Returns:

(Boolean) —
true when GZIP, false if not GZIP or not a table extraction.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 104

def compression?
  return false unless table?
  val = @gapi.configuration.extract.compression
  val == "GZIP"
end

#csv? ⇒ `Boolean`

Checks if the destination format for the table data is CSV. Tables with nested or repeated fields cannot be exported as CSV. The default is true for tables. Not applicable when extracting models.

Returns:

(Boolean) —
true when CSV, or false if not CSV or not a table extraction.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 132

def csv?
  return false unless table?
  val = @gapi.configuration.extract.destination_format
  return true if val.nil?
  val == "CSV"
end

#delimiter ⇒ `String`^?

The character or symbol the operation uses to delimit fields in the exported data. The default is a comma (,) for tables. Not applicable when extracting models.

Returns:

(String, nil) —
A string containing the character, such as ",", nil if not a table extraction.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 188

def delimiter
  return unless table?
  val = @gapi.configuration.extract.field_delimiter
  val = "," if val.nil?
  val
end

#destinations ⇒ `Object`

The URI or URIs representing the Google Cloud Storage files to which the data is exported.



61
62
63

# File 'lib/google/cloud/bigquery/extract_job.rb', line 61

def destinations
  Array @gapi.configuration.extract.destination_uris
end

#destinations_counts ⇒ `Hash<String, Integer>`

A hash containing the URI or URI pattern specified in #destinations mapped to the counts of files per destination.

Returns:

(Hash<String, Integer>) —
A Hash with the URI patterns as keys and the counts as values.



227
228
229

# File 'lib/google/cloud/bigquery/extract_job.rb', line 227

def destinations_counts
  Hash[destinations.zip destinations_file_counts]
end

#destinations_file_counts ⇒ `Array<Integer>`

The number of files per destination URI or URI pattern specified in #destinations.

Returns:

(Array<Integer>) —
An array of values in the same order as the URI patterns.



216
217
218

# File 'lib/google/cloud/bigquery/extract_job.rb', line 216

def destinations_file_counts
  Array @gapi.statistics.extract.destination_uri_file_counts
end

#json? ⇒ `Boolean`

Checks if the destination format for the table data is newline-delimited JSON. The default is false. Not applicable when extracting models.

Returns:

(Boolean) —
true when NEWLINE_DELIMITED_JSON, false if not NEWLINE_DELIMITED_JSON or not a table extraction.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 118

def json?
  return false unless table?
  val = @gapi.configuration.extract.destination_format
  val == "NEWLINE_DELIMITED_JSON"
end

#ml_tf_saved_model? ⇒ `Boolean`

Checks if the destination format for the model is TensorFlow SavedModel. The default is true for models. Not applicable when extracting tables.

Returns:

(Boolean) —
true when ML_TF_SAVED_MODEL, false if not ML_TF_SAVED_MODEL or not a model extraction.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 160

def ml_tf_saved_model?
  return false unless model?
  val = @gapi.configuration.extract.destination_format
  return true if val.nil?
  val == "ML_TF_SAVED_MODEL"
end

#ml_xgboost_booster? ⇒ `Boolean`

Checks if the destination format for the model is XGBoost. The default is false. Not applicable when extracting tables.

Returns:

(Boolean) —
true when ML_XGBOOST_BOOSTER, false if not ML_XGBOOST_BOOSTER or not a model extraction.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 174

def ml_xgboost_booster?
  return false unless model?
  val = @gapi.configuration.extract.destination_format
  val == "ML_XGBOOST_BOOSTER"
end

#model? ⇒ `Boolean`

Whether the source of the export job is a model. See #source.

Returns:

(Boolean) —
true when the source is a model, false otherwise.



94
95
96

# File 'lib/google/cloud/bigquery/extract_job.rb', line 94

def model?
  !@gapi.configuration.extract.source_model.nil?
end

#print_header? ⇒ `Boolean`

Checks if the exported data contains a header row. The default is true for tables. Not applicable when extracting models.

Returns:

(Boolean) —
true when the print header configuration is present or nil, false if disabled or not a table extraction.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 202

def print_header?
  return false unless table?
  val = @gapi.configuration.extract.print_header
  val = true if val.nil?
  val
end

#source ⇒ `Table`, ...

The table or model which is exported.

Returns:

(Table, Model, nil) —
A table or model instance, or nil.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 70

def source
  if (table = @gapi.configuration.extract.source_table)
    retrieve_table table.project_id, table.dataset_id, table.table_id
  elsif (model = @gapi.configuration.extract.source_model)
    retrieve_model model.project_id, model.dataset_id, model.model_id
  end
end

#table? ⇒ `Boolean`

Whether the source of the export job is a table. See #source.

Returns:

(Boolean) —
true when the source is a table, false otherwise.



84
85
86

# File 'lib/google/cloud/bigquery/extract_job.rb', line 84

def table?
  !@gapi.configuration.extract.source_table.nil?
end

#use_avro_logical_types? ⇒ `Boolean`

If #avro? (#format is set to "AVRO"), this flag indicates whether to enable extracting applicable column types (such as TIMESTAMP) to their corresponding AVRO logical types (timestamp-micros), instead of only using their raw types (avro-long). Not applicable when extracting models.

Returns:

(Boolean) —
true when applicable column types will use their corresponding AVRO logical types, false if not enabled or not a table extraction.

# File 'lib/google/cloud/bigquery/extract_job.rb', line 242

def use_avro_logical_types?
  return false unless table?
  @gapi.configuration.extract.use_avro_logical_types
end

Class: Google::Cloud::Bigquery::ExtractJob

Overview

ExtractJob

Examples:

Export table data

Export a model

Direct Known Subclasses

Defined Under Namespace

Instance Method Summary collapse

Methods inherited from Job

Instance Method Details

#avro? ⇒ Boolean

#compression? ⇒ Boolean

#csv? ⇒ Boolean

#delimiter ⇒ String?

#destinations ⇒ Object

#destinations_counts ⇒ Hash<String, Integer>

#destinations_file_counts ⇒ Array<Integer>

#json? ⇒ Boolean

#ml_tf_saved_model? ⇒ Boolean

#ml_xgboost_booster? ⇒ Boolean

#model? ⇒ Boolean

#print_header? ⇒ Boolean

#source ⇒ Table, ...

#table? ⇒ Boolean

#use_avro_logical_types? ⇒ Boolean

#avro? ⇒ `Boolean`

#compression? ⇒ `Boolean`

#csv? ⇒ `Boolean`

#delimiter ⇒ `String`^?

#destinations ⇒ `Object`

#destinations_counts ⇒ `Hash<String, Integer>`

#destinations_file_counts ⇒ `Array<Integer>`

#json? ⇒ `Boolean`

#ml_tf_saved_model? ⇒ `Boolean`

#ml_xgboost_booster? ⇒ `Boolean`

#model? ⇒ `Boolean`

#print_header? ⇒ `Boolean`

#source ⇒ `Table`, ...

#table? ⇒ `Boolean`

#use_avro_logical_types? ⇒ `Boolean`