Class: Gcloud::Bigquery::Dataset

Inherits:
Object
  • Object
show all
Defined in:
lib/gcloud/bigquery/dataset.rb,
lib/gcloud/bigquery/dataset/list.rb

Overview

Dataset

Represents a Dataset. A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. A dataset is contained within a specific project.

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

dataset = bigquery.create_dataset "my_dataset",
                                  name: "My Dataset"
                                  description: "This is my Dataset"

Defined Under Namespace

Classes: List

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeDataset

Create an empty Dataset object.



51
52
53
54
# File 'lib/gcloud/bigquery/dataset.rb', line 51

def initialize #:nodoc:
  @connection = nil
  @gapi = {}
end

Instance Attribute Details

#connectionObject

The Connection object.



43
44
45
# File 'lib/gcloud/bigquery/dataset.rb', line 43

def connection
  @connection
end

#gapiObject

The Google API Client object.



47
48
49
# File 'lib/gcloud/bigquery/dataset.rb', line 47

def gapi
  @gapi
end

Class Method Details

.from_gapi(gapi, conn) ⇒ Object

New Dataset from a Google API Client object.



615
616
617
618
619
620
# File 'lib/gcloud/bigquery/dataset.rb', line 615

def self.from_gapi gapi, conn #:nodoc:
  new.tap do |f|
    f.gapi = gapi
    f.connection = conn
  end
end

Instance Method Details

#create_table(table_id, options = {}) ⇒ Object

Creates a new table.

Parameters

table_id

The ID of the table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters. (String)

options

An optional Hash for controlling additional behavior. (Hash)

options[:name]

A descriptive name for the table. (String)

options[:description]

A user-friendly description of the table. (String)

options[:schema]

A schema specifying fields and data types for the table. See the Tables resource for more information. (Hash)

Returns

Gcloud::Bigquery::Table

Examples

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"

A name and description can be provided:

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

schema = {
  "fields" => [
    {
      "name" => "first_name",
      "type" => "STRING",
      "mode" => "REQUIRED"
    },
    {
      "name" => "cities_lived",
      "type" => "RECORD",
      "mode" => "REPEATED",
      "fields" => [
        {
          "name" => "place",
          "type" => "STRING",
          "mode" => "REQUIRED"
        },
        {
          "name" => "number_of_years",
          "type" => "INTEGER",
          "mode" => "REQUIRED"
        }
      ]
    }
  ]
}
table = dataset.create_table "my_table",
                             name: "My Table",
                             schema: schema

:category: Table



297
298
299
300
301
302
303
304
305
# File 'lib/gcloud/bigquery/dataset.rb', line 297

def create_table table_id, options = {}
  ensure_connection!
  resp = connection.insert_table dataset_id, table_id, options
  if resp.success?
    Table.from_gapi resp.data, connection
  else
    fail ApiError.from_response(resp)
  end
end

#create_view(table_id, query, options = {}) ⇒ Object

Creates a new view table from the given query.

Parameters

table_id

The ID of the view table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters. (String)

query

The query that BigQuery executes when the view is referenced. (String)

options

An optional Hash for controlling additional behavior. (Hash)

options[:name]

A descriptive name for the table. (String)

options[:description]

A user-friendly description of the table. (String)

Returns

Gcloud::Bigquery::View

Examples

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
          "SELECT name, age FROM [proj:dataset.users]"

A name and description can be provided:

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
          "SELECT name, age FROM [proj:dataset.users]",
          name: "My View", description: "This is my view"

:category: Table



353
354
355
356
# File 'lib/gcloud/bigquery/dataset.rb', line 353

def create_view table_id, query, options = {}
  options[:query] = query
  create_table table_id, options
end

#created_atObject

The time when this dataset was created.

:category: Attributes



158
159
160
161
# File 'lib/gcloud/bigquery/dataset.rb', line 158

def created_at
  ensure_full_data!
  Time.at(@gapi["creationTime"] / 1000.0)
end

#dataset_idObject

A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.

:category: Attributes



63
64
65
# File 'lib/gcloud/bigquery/dataset.rb', line 63

def dataset_id
  @gapi["datasetReference"]["datasetId"]
end

#default_expirationObject

The default lifetime of all tables in the dataset, in milliseconds.

:category: Attributes



138
139
140
141
# File 'lib/gcloud/bigquery/dataset.rb', line 138

def default_expiration
  ensure_full_data!
  @gapi["defaultTableExpirationMs"]
end

#default_expiration=(new_default_expiration) ⇒ Object

Updates the default lifetime of all tables in the dataset, in milliseconds.

:category: Attributes



149
150
151
# File 'lib/gcloud/bigquery/dataset.rb', line 149

def default_expiration= new_default_expiration
  patch_gapi! default_expiration: new_default_expiration
end

#delete(options = {}) ⇒ Object

Permanently deletes the dataset. The dataset must be empty before it can be deleted unless the force option is set to true.

Parameters

options

An optional Hash for controlling additional behavior. (Hash)

options[:force]

If true, delete all the tables in the dataset. If false and the dataset contains tables, the request will fail. Default is false. (Boolean)

Returns

true if the dataset was deleted.

Example

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

dataset = bigquery.dataset "my_dataset"
dataset.delete

:category: Lifecycle



213
214
215
216
217
218
219
220
221
# File 'lib/gcloud/bigquery/dataset.rb', line 213

def delete options = {}
  ensure_connection!
  resp = connection.delete_dataset dataset_id, options
  if resp.success?
    true
  else
    fail ApiError.from_response(resp)
  end
end

#descriptionObject

A user-friendly description of the dataset.

:category: Attributes



119
120
121
122
# File 'lib/gcloud/bigquery/dataset.rb', line 119

def description
  ensure_full_data!
  @gapi["description"]
end

#description=(new_description) ⇒ Object

Updates the user-friendly description of the dataset.

:category: Attributes



129
130
131
# File 'lib/gcloud/bigquery/dataset.rb', line 129

def description= new_description
  patch_gapi! description: new_description
end

#etagObject

A string hash of the dataset.

:category: Attributes



99
100
101
102
# File 'lib/gcloud/bigquery/dataset.rb', line 99

def etag
  ensure_full_data!
  @gapi["etag"]
end

#locationObject

The geographic location where the dataset should reside. Possible values include EU and US. The default value is US.

:category: Attributes



179
180
181
182
# File 'lib/gcloud/bigquery/dataset.rb', line 179

def location
  ensure_full_data!
  @gapi["location"]
end

#modified_atObject

The date when this dataset or any of its tables was last modified.

:category: Attributes



168
169
170
171
# File 'lib/gcloud/bigquery/dataset.rb', line 168

def modified_at
  ensure_full_data!
  Time.at(@gapi["lastModifiedTime"] / 1000.0)
end

#nameObject

A descriptive name for the dataset.

:category: Attributes



81
82
83
# File 'lib/gcloud/bigquery/dataset.rb', line 81

def name
  @gapi["friendlyName"]
end

#name=(new_name) ⇒ Object

Updates the descriptive name for the dataset.

:category: Attributes



90
91
92
# File 'lib/gcloud/bigquery/dataset.rb', line 90

def name= new_name
  patch_gapi! name: new_name
end

#project_idObject

The ID of the project containing this dataset.

:category: Attributes



72
73
74
# File 'lib/gcloud/bigquery/dataset.rb', line 72

def project_id
  @gapi["datasetReference"]["projectId"]
end

#query(query, options = {}) ⇒ Object

Queries data using the synchronous method.

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

Parameters

query

A query string, following the BigQuery query syntax, of the query to execute. Example: “SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]”. (String)

options[:max]

The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies. (Integer)

options[:timeout]

How long to wait for the query to complete, in milliseconds, before the request times out and returns. Note that this is only a timeout for the request, not the query. If the query takes longer to run than the timeout value, the call returns without any results and with QueryData#complete? set to false. The default value is 10000 milliseconds (10 seconds). (Integer)

options[:dryrun]

If set to true, BigQuery doesn’t run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value is false. (Boolean)

options[:cache]

Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching. (Boolean)

Returns

Gcloud::Bigquery::QueryData

Example

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

data = bigquery.query "SELECT name FROM my_table"
data.each do |row|
  puts row["name"]
end

:category: Data



601
602
603
604
605
606
607
608
609
610
611
# File 'lib/gcloud/bigquery/dataset.rb', line 601

def query query, options = {}
  options[:dataset] ||= dataset_id
  options[:project] ||= project_id
  ensure_connection!
  resp = connection.query query, options
  if resp.success?
    QueryData.from_gapi resp.data, connection
  else
    fail ApiError.from_response(resp)
  end
end

#query_job(query, options = {}) ⇒ Object

Queries data using the asynchronous method.

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

Parameters

query

A query string, following the BigQuery query syntax, of the query to execute. Example: “SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]”. (String)

options[:priority]

Specifies a priority for the query. Possible values include INTERACTIVE and BATCH. The default value is INTERACTIVE. (String)

options[:cache]

Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. (Boolean)

options[:table]

The destination table where the query results should be stored. If not present, a new table will be created to store the results. (Table)

options[:create]

Specifies whether the job is allowed to create new tables. (String)

The following values are supported:

  • needed - Create the table if it does not exist.

  • never - The table must already exist. A ‘notFound’ error is raised if the table does not exist.

options[:write]

Specifies the action that occurs if the destination table already exists. (String)

The following values are supported:

  • truncate - BigQuery overwrites the table data.

  • append - BigQuery appends the data to the table.

  • empty - A ‘duplicate’ error is returned in the job result if the table exists and contains data.

options[:large_results]

If true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requires options[:table] to be set. (Boolean)

options[:flatten]

Flattens all nested and repeated fields in the query results. The default value is true. options[:large_results] must be true if this is set to false. (Boolean)

Returns

Gcloud::Bigquery::QueryJob

Example

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

job = bigquery.query_job "SELECT name FROM my_table"

loop do
  break if job.done?
  sleep 1
  job.refresh!
end
if !job.failed?
  job.query_results.each do |row|
    puts row["name"]
  end
end

:category: Data



532
533
534
535
536
537
538
539
540
541
# File 'lib/gcloud/bigquery/dataset.rb', line 532

def query_job query, options = {}
  options[:dataset] ||= self
  ensure_connection!
  resp = connection.query_job query, options
  if resp.success?
    Job.from_gapi resp.data, connection
  else
    fail ApiError.from_response(resp)
  end
end

#table(table_id) ⇒ Object

Retrieves an existing table by ID.

Parameters

table_id

The ID of a table. (String)

Returns

Gcloud::Bigquery::Table or Gcloud::Bigquery::View or nil if the table does not exist

Example

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
puts table.name

:category: Table



383
384
385
386
387
388
389
390
391
# File 'lib/gcloud/bigquery/dataset.rb', line 383

def table table_id
  ensure_connection!
  resp = connection.get_table dataset_id, table_id
  if resp.success?
    Table.from_gapi resp.data, connection
  else
    nil
  end
end

#tables(options = {}) ⇒ Object

Retrieves the list of tables belonging to the dataset.

Parameters

options

An optional Hash for controlling additional behavior. (Hash)

options[:token]

A previously-returned page token representing part of the larger set of results to view. (String)

options[:max]

Maximum number of tables to return. (Integer)

Returns

Array of Gcloud::Bigquery::Table or Gcloud::Bigquery::View (Gcloud::Bigquery::Table::List)

Examples

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
tables = dataset.tables
tables.each do |table|
  puts table.name
end

If you have a significant number of tables, you may need to paginate through them: (See Dataset::List#token)

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

all_tables = []
tmp_tables = dataset.tables
while tmp_tables.any? do
  tmp_tables.each do |table|
    all_tables << table
  end
  # break loop if no more tables available
  break if tmp_tables.token.nil?
  # get the next group of tables
  tmp_tables = dataset.tables token: tmp_tables.token
end

:category: Table



446
447
448
449
450
451
452
453
454
# File 'lib/gcloud/bigquery/dataset.rb', line 446

def tables options = {}
  ensure_connection!
  resp = connection.list_tables dataset_id, options
  if resp.success?
    Table::List.from_resp resp, connection
  else
    fail ApiError.from_response(resp)
  end
end

#urlObject

A URL that can be used to access the dataset using the REST API.

:category: Attributes



109
110
111
112
# File 'lib/gcloud/bigquery/dataset.rb', line 109

def url
  ensure_full_data!
  @gapi["selfLink"]
end