Class: Gcloud::Bigquery::Dataset

Inherits:

Object

Object
Gcloud::Bigquery::Dataset

show all

Defined in:: lib/gcloud/bigquery/dataset.rb,
lib/gcloud/bigquery/dataset/list.rb,
lib/gcloud/bigquery/dataset/access.rb

Overview

Dataset

Represents a Dataset. A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. A dataset is contained within a specific project.

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

dataset = bigquery.create_dataset "my_dataset",
                                  name: "My Dataset",
                                  description: "This is my Dataset"

Defined Under Namespace

Classes: Access, List

Instance Attribute Summary collapse

#connection ⇒ Object

The Connection object.
#gapi ⇒ Object

The Google API Client object.

Class Method Summary collapse

.from_gapi(gapi, conn) ⇒ Object

New Dataset from a Google API Client object.

Instance Method Summary collapse

#access {|a2| ... } ⇒ Object

Retrieves the access rules for a Dataset using the Google Cloud Datastore API data structure of an array of hashes.
#access=(new_access) ⇒ Object

Sets the access rules for a Dataset using the Google Cloud Datastore API data structure of an array of hashes.
#api_url ⇒ Object

A URL that can be used to access the dataset using the REST API.
#create_table(table_id, name: nil, description: nil, schema: nil) ⇒ Object

Creates a new table.
#create_view(table_id, query, name: nil, description: nil) ⇒ Object

Creates a new view table from the given query.
#created_at ⇒ Object

The time when this dataset was created.
#dataset_id ⇒ Object

A unique ID for this dataset, without the project name.
#dataset_ref ⇒ Object

The gapi fragment containing the Project ID and Dataset ID as a camel-cased hash.
#default_expiration ⇒ Object

The default lifetime of all tables in the dataset, in milliseconds.
#default_expiration=(new_default_expiration) ⇒ Object

Updates the default lifetime of all tables in the dataset, in milliseconds.
#delete(force: nil) ⇒ Object

Permanently deletes the dataset.
#description ⇒ Object

A user-friendly description of the dataset.
#description=(new_description) ⇒ Object

Updates the user-friendly description of the dataset.
#etag ⇒ Object

A string hash of the dataset.
#initialize ⇒ Dataset constructor

Create an empty Dataset object.
#location ⇒ Object

The geographic location where the dataset should reside.
#modified_at ⇒ Object

The date when this dataset or any of its tables was last modified.
#name ⇒ Object

A descriptive name for the dataset.
#name=(new_name) ⇒ Object

Updates the descriptive name for the dataset.
#project_id ⇒ Object

The ID of the project containing this dataset.
#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true) ⇒ Object

Queries data using the synchronous method.
#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil) ⇒ Object

Queries data using the asynchronous method.
#table(table_id) ⇒ Object

Retrieves an existing table by ID.
#tables(token: nil, max: nil) ⇒ Object

Retrieves the list of tables belonging to the dataset.

Constructor Details

#initialize ⇒ `Dataset`

Create an empty Dataset object.

# File 'lib/gcloud/bigquery/dataset.rb', line 53

def initialize #:nodoc:
  @connection = nil
  @gapi = {}
end

Instance Attribute Details

#connection ⇒ `Object`

The Connection object.



45
46
47

# File 'lib/gcloud/bigquery/dataset.rb', line 45

def connection
  @connection
end

#gapi ⇒ `Object`

The Google API Client object.



49
50
51

# File 'lib/gcloud/bigquery/dataset.rb', line 49

def gapi
  @gapi
end

Class Method Details

.from_gapi(gapi, conn) ⇒ `Object`

New Dataset from a Google API Client object.

# File 'lib/gcloud/bigquery/dataset.rb', line 734

def self.from_gapi gapi, conn #:nodoc:
  new.tap do |f|
    f.gapi = gapi
    f.connection = conn
  end
end

Instance Method Details

#access {|a2| ... } ⇒ `Object`

Retrieves the access rules for a Dataset using the Google Cloud Datastore API data structure of an array of hashes. The rules can be updated when passing a block, see Dataset::Access for all the methods available. See BigQuery Access Control for more information.

Examples

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

dataset.access #=> [{"role"=>"OWNER",
               #     "specialGroup"=>"projectOwners"},
               #    {"role"=>"WRITER",
               #     "specialGroup"=>"projectWriters"},
               #    {"role"=>"READER",
               #     "specialGroup"=>"projectReaders"},
               #    {"role"=>"OWNER",
               #     "userByEmail"=>"123456789-...com"}]

Manage the access rules by passing a block.

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

dataset.access do |access|
  access.add_owner_group "[email protected]"
  access.add_writer_user "[email protected]"
  access.remove_writer_user "[email protected]"
  access.add_reader_special :all
  access.add_reader_view other_dataset_view_object
end

Yields:

(a2)

# File 'lib/gcloud/bigquery/dataset.rb', line 236

def access
  ensure_full_data!
  g = @gapi
  g = g.to_hash if g.respond_to? :to_hash
  a = g["access"] ||= []
  return a unless block_given?
  a2 = Access.new a, dataset_ref
  yield a2
  self.access = a2.access if a2.changed?
end

#access=(new_access) ⇒ `Object`

Sets the access rules for a Dataset using the Google Cloud Datastore API data structure of an array of hashes. See BigQuery Access Control for more information.

This method is provided for advanced usage of managing the access rules. Calling #access with a block is the preferred way to manage access rules.

Example

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

dataset.access = [{"role"=>"OWNER",
                   "specialGroup"=>"projectOwners"},
                  {"role"=>"WRITER",
                   "specialGroup"=>"projectWriters"},
                  {"role"=>"READER",
                   "specialGroup"=>"projectReaders"},
                  {"role"=>"OWNER",
                   "userByEmail"=>"123456789-...com"}]



274
275
276

# File 'lib/gcloud/bigquery/dataset.rb', line 274

def access= new_access
  patch_gapi! access: new_access
end

#api_url ⇒ `Object`

A URL that can be used to access the dataset using the REST API.

:category: Attributes

# File 'lib/gcloud/bigquery/dataset.rb', line 120

def api_url
  ensure_full_data!
  @gapi["selfLink"]
end

#create_table(table_id, name: nil, description: nil, schema: nil) ⇒ `Object`

Creates a new table.

Parameters

table_id: The ID of the table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters. (String)
name: A descriptive name for the table. (String)
description: A user-friendly description of the table. (String)
schema: A hash specifying fields and data types for the table. A block may be passed instead (see examples.) For the format of this hash, see the Tables resource . (Hash)

Returns

Gcloud::Bigquery::Table

Examples

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"

You can also pass name and description options.

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"
                             name: "My Table",
                             description: "A description of my table."

You can define the table’s schema using a block.

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table" do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Or, if you are adapting existing code that was written for the Rest API , you can pass the table’s schema as a hash.

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

schema = {
  "fields" => [
    {
      "name" => "first_name",
      "type" => "STRING",
      "mode" => "REQUIRED"
    },
    {
      "name" => "cities_lived",
      "type" => "RECORD",
      "mode" => "REPEATED",
      "fields" => [
        {
          "name" => "place",
          "type" => "STRING",
          "mode" => "REQUIRED"
        },
        {
          "name" => "number_of_years",
          "type" => "INTEGER",
          "mode" => "REQUIRED"
        }
      ]
    }
  ]
}
table = dataset.create_table "my_table", schema: schema

:category: Table

# File 'lib/gcloud/bigquery/dataset.rb', line 414

def create_table table_id, name: nil, description: nil, schema: nil
  ensure_connection!
  if block_given?
    if schema
      fail ArgumentError, "only schema block or schema option is allowed"
    end
    schema_builder = Table::Schema.new nil
    yield schema_builder
    schema = schema_builder.schema if schema_builder.changed?
  end
  options = { name: name, description: description, schema: schema }
  insert_table table_id, options
end

#create_view(table_id, query, name: nil, description: nil) ⇒ `Object`

Creates a new view table from the given query.

Parameters

table_id: The ID of the view table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters. (String)
query: The query that BigQuery executes when the view is referenced. (String)
name: A descriptive name for the table. (String)
description: A user-friendly description of the table. (String)

Returns

Gcloud::Bigquery::View

Examples

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
          "SELECT name, age FROM [proj:dataset.users]"

A name and description can be provided:

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
          "SELECT name, age FROM [proj:dataset.users]",
          name: "My View", description: "This is my view"

:category: Table

# File 'lib/gcloud/bigquery/dataset.rb', line 472

def create_view table_id, query, name: nil, description: nil
  options = { query: query, name: name, description: description }
  insert_table table_id, options
end

#created_at ⇒ `Object`

The time when this dataset was created.

:category: Attributes

# File 'lib/gcloud/bigquery/dataset.rb', line 169

def created_at
  ensure_full_data!
  Time.at(@gapi["creationTime"] / 1000.0)
end

#dataset_id ⇒ `Object`

A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.

:category: Attributes



65
66
67

# File 'lib/gcloud/bigquery/dataset.rb', line 65

def dataset_id
  @gapi["datasetReference"]["datasetId"]
end

#dataset_ref ⇒ `Object`

The gapi fragment containing the Project ID and Dataset ID as a camel-cased hash.

# File 'lib/gcloud/bigquery/dataset.rb', line 81

def dataset_ref #:nodoc:
  dataset_ref = @gapi["datasetReference"]
  dataset_ref = dataset_ref.to_hash if dataset_ref.respond_to? :to_hash
  dataset_ref
end

#default_expiration ⇒ `Object`

The default lifetime of all tables in the dataset, in milliseconds.

:category: Attributes

# File 'lib/gcloud/bigquery/dataset.rb', line 149

def default_expiration
  ensure_full_data!
  @gapi["defaultTableExpirationMs"]
end

#default_expiration=(new_default_expiration) ⇒ `Object`

Updates the default lifetime of all tables in the dataset, in milliseconds.

:category: Attributes



160
161
162

# File 'lib/gcloud/bigquery/dataset.rb', line 160

def default_expiration= new_default_expiration
  patch_gapi! default_expiration: new_default_expiration
end

#delete(force: nil) ⇒ `Object`

Permanently deletes the dataset. The dataset must be empty before it can be deleted unless the force option is set to true.

Parameters

force: If true, delete all the tables in the dataset. If false and the dataset contains tables, the request will fail. Default is false. (Boolean)

Returns

true if the dataset was deleted.

Example

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

dataset = bigquery.dataset "my_dataset"
dataset.delete

:category: Lifecycle

# File 'lib/gcloud/bigquery/dataset.rb', line 305

def delete force: nil
  ensure_connection!
  resp = connection.delete_dataset dataset_id, force
  if resp.success?
    true
  else
    fail ApiError.from_response(resp)
  end
end

#description ⇒ `Object`

A user-friendly description of the dataset.

:category: Attributes

# File 'lib/gcloud/bigquery/dataset.rb', line 130

def description
  ensure_full_data!
  @gapi["description"]
end

#description=(new_description) ⇒ `Object`

Updates the user-friendly description of the dataset.

:category: Attributes



140
141
142

# File 'lib/gcloud/bigquery/dataset.rb', line 140

def description= new_description
  patch_gapi! description: new_description
end

#etag ⇒ `Object`

A string hash of the dataset.

:category: Attributes

# File 'lib/gcloud/bigquery/dataset.rb', line 110

def etag
  ensure_full_data!
  @gapi["etag"]
end

#location ⇒ `Object`

The geographic location where the dataset should reside. Possible values include EU and US. The default value is US.

:category: Attributes

# File 'lib/gcloud/bigquery/dataset.rb', line 190

def location
  ensure_full_data!
  @gapi["location"]
end

#modified_at ⇒ `Object`

The date when this dataset or any of its tables was last modified.

:category: Attributes

# File 'lib/gcloud/bigquery/dataset.rb', line 179

def modified_at
  ensure_full_data!
  Time.at(@gapi["lastModifiedTime"] / 1000.0)
end

#name ⇒ `Object`

A descriptive name for the dataset.

:category: Attributes



92
93
94

# File 'lib/gcloud/bigquery/dataset.rb', line 92

def name
  @gapi["friendlyName"]
end

#name=(new_name) ⇒ `Object`

Updates the descriptive name for the dataset.

:category: Attributes



101
102
103

# File 'lib/gcloud/bigquery/dataset.rb', line 101

def name= new_name
  patch_gapi! name: new_name
end

#project_id ⇒ `Object`

The ID of the project containing this dataset.

:category: Attributes



74
75
76

# File 'lib/gcloud/bigquery/dataset.rb', line 74

def project_id
  @gapi["datasetReference"]["projectId"]
end

#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true) ⇒ `Object`

Queries data using the synchronous method.

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

Parameters

query: A query string, following the BigQuery query syntax, of the query to execute. Example: “SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]”. (String)
max: The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies. (Integer)
timeout: How long to wait for the query to complete, in milliseconds, before the request times out and returns. Note that this is only a timeout for the request, not the query. If the query takes longer to run than the timeout value, the call returns without any results and with QueryData#complete? set to false. The default value is 10000 milliseconds (10 seconds). (Integer)
dryrun: If set to true, BigQuery doesn’t run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value is false. (Boolean)
cache: Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching. (Boolean)

Returns

Gcloud::Bigquery::QueryData

Example

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

data = bigquery.query "SELECT name FROM my_table"
data.each do |row|
  puts row["name"]
end

:category: Data

# File 'lib/gcloud/bigquery/dataset.rb', line 719

def query query, max: nil, timeout: 10000, dryrun: nil, cache: true
  options = { max: max, timeout: timeout, dryrun: dryrun, cache: cache }
  options[:dataset] ||= dataset_id
  options[:project] ||= project_id
  ensure_connection!
  resp = connection.query query, options
  if resp.success?
    QueryData.from_gapi resp.data, connection
  else
    fail ApiError.from_response(resp)
  end
end

#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil) ⇒ `Object`

Queries data using the asynchronous method.

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

Parameters

query

A query string, following the BigQuery query syntax, of the query to execute. Example: “SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]”. (String)

priority

Specifies a priority for the query. Possible values include INTERACTIVE and BATCH. The default value is INTERACTIVE. (String)

cache

Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. (Boolean)

table

The destination table where the query results should be stored. If not present, a new table will be created to store the results. (Table)

create

Specifies whether the job is allowed to create new tables. (String)

The following values are supported:

needed - Create the table if it does not exist.
never - The table must already exist. A ‘notFound’ error is raised if the table does not exist.

write

Specifies the action that occurs if the destination table already exists. (String)

The following values are supported:

truncate - BigQuery overwrites the table data.
append - BigQuery appends the data to the table.
empty - A ‘duplicate’ error is returned in the job result if the table exists and contains data.

large_results

If true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requires table parameter to be set. (Boolean)

flatten

Flattens all nested and repeated fields in the query results. The default value is true. large_results parameter must be true if this is set to false. (Boolean)

Returns

Gcloud::Bigquery::QueryJob

Example

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

job = bigquery.query_job "SELECT name FROM my_table"

job.wait_until_done!
if !job.failed?
  job.query_results.each do |row|
    puts row["name"]
  end
end

:category: Data

# File 'lib/gcloud/bigquery/dataset.rb', line 646

def query_job query, priority: "INTERACTIVE", cache: true, table: nil,
              create: nil, write: nil, large_results: nil, flatten: nil
  options = { priority: priority, cache: cache, table: table,
              create: create, write: write, large_results: large_results,
              flatten: flatten }
  options[:dataset] ||= self
  ensure_connection!
  resp = connection.query_job query, options
  if resp.success?
    Job.from_gapi resp.data, connection
  else
    fail ApiError.from_response(resp)
  end
end

#table(table_id) ⇒ `Object`

Retrieves an existing table by ID.

Parameters

table_id: The ID of a table. (String)

Returns

Gcloud::Bigquery::Table or Gcloud::Bigquery::View or nil if the table does not exist

Example

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
puts table.name

:category: Table

# File 'lib/gcloud/bigquery/dataset.rb', line 502

def table table_id
  ensure_connection!
  resp = connection.get_table dataset_id, table_id
  if resp.success?
    Table.from_gapi resp.data, connection
  else
    nil
  end
end

#tables(token: nil, max: nil) ⇒ `Object`

Retrieves the list of tables belonging to the dataset.

Parameters

token: A previously-returned page token representing part of the larger set of results to view. (String)
max: Maximum number of tables to return. (Integer)

Returns

Array of Gcloud::Bigquery::Table or Gcloud::Bigquery::View (See Gcloud::Bigquery::Table::List)

Examples

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
tables = dataset.tables
tables.each do |table|
  puts table.name
end

If you have a significant number of tables, you may need to paginate through them: (See Dataset::List#token)

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

all_tables = []
tmp_tables = dataset.tables
while tmp_tables.any? do
  tmp_tables.each do |table|
    all_tables << table
  end
  # break loop if no more tables available
  break if tmp_tables.token.nil?
  # get the next group of tables
  tmp_tables = dataset.tables token: tmp_tables.token
end

:category: Table

# File 'lib/gcloud/bigquery/dataset.rb', line 563

def tables token: nil, max: nil
  ensure_connection!
  options = { token: token, max: max }
  resp = connection.list_tables dataset_id, options
  if resp.success?
    Table::List.from_response resp, connection
  else
    fail ApiError.from_response(resp)
  end
end

Class: Gcloud::Bigquery::Dataset

Overview

Dataset

Defined Under Namespace

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize ⇒ Dataset

Instance Attribute Details

#connection ⇒ Object

#gapi ⇒ Object

Class Method Details

.from_gapi(gapi, conn) ⇒ Object

Instance Method Details

#access {|a2| ... } ⇒ Object

Examples

#access=(new_access) ⇒ Object

Example

#api_url ⇒ Object

#create_table(table_id, name: nil, description: nil, schema: nil) ⇒ Object

Parameters

Returns

Examples

#create_view(table_id, query, name: nil, description: nil) ⇒ Object

Parameters

Returns

Examples

#created_at ⇒ Object

#dataset_id ⇒ Object

#dataset_ref ⇒ Object

#default_expiration ⇒ Object

#default_expiration=(new_default_expiration) ⇒ Object

#delete(force: nil) ⇒ Object

Parameters

Returns

Example

#description ⇒ Object

#description=(new_description) ⇒ Object

#etag ⇒ Object

#location ⇒ Object

#modified_at ⇒ Object

#name ⇒ Object

#name=(new_name) ⇒ Object

#project_id ⇒ Object

#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true) ⇒ Object

Parameters

Returns

Example

#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil) ⇒ Object

Parameters

Returns

Example

#table(table_id) ⇒ Object

Parameters

Returns

Example

#tables(token: nil, max: nil) ⇒ Object

Parameters

Returns

Examples

#initialize ⇒ `Dataset`

#connection ⇒ `Object`

#gapi ⇒ `Object`

.from_gapi(gapi, conn) ⇒ `Object`

#access {|a2| ... } ⇒ `Object`

#access=(new_access) ⇒ `Object`

#api_url ⇒ `Object`

#create_table(table_id, name: nil, description: nil, schema: nil) ⇒ `Object`

#create_view(table_id, query, name: nil, description: nil) ⇒ `Object`

#created_at ⇒ `Object`

#dataset_id ⇒ `Object`

#dataset_ref ⇒ `Object`

#default_expiration ⇒ `Object`

#default_expiration=(new_default_expiration) ⇒ `Object`

#delete(force: nil) ⇒ `Object`

#description ⇒ `Object`

#description=(new_description) ⇒ `Object`

#etag ⇒ `Object`

#location ⇒ `Object`

#modified_at ⇒ `Object`

#name ⇒ `Object`

#name=(new_name) ⇒ `Object`

#project_id ⇒ `Object`

#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true) ⇒ `Object`

#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil) ⇒ `Object`

#table(table_id) ⇒ `Object`

#tables(token: nil, max: nil) ⇒ `Object`