Class: Gcloud::Bigquery::Dataset
- Inherits:
-
Object
- Object
- Gcloud::Bigquery::Dataset
- Defined in:
- lib/gcloud/bigquery/dataset.rb,
lib/gcloud/bigquery/dataset/list.rb,
lib/gcloud/bigquery/dataset/access.rb
Overview
Dataset
Represents a Dataset. A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. A dataset is contained within a specific project.
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.create_dataset "my_dataset",
name: "My Dataset",
description: "This is my Dataset"
Defined Under Namespace
Instance Attribute Summary collapse
-
#connection ⇒ Object
The Connection object.
-
#gapi ⇒ Object
The Google API Client object.
Class Method Summary collapse
-
.from_gapi(gapi, conn) ⇒ Object
New Dataset from a Google API Client object.
Instance Method Summary collapse
-
#access {|a2| ... } ⇒ Object
Retrieves the access rules for a Dataset using the Google Cloud Datastore API data structure of an array of hashes.
-
#access=(new_access) ⇒ Object
Sets the access rules for a Dataset using the Google Cloud Datastore API data structure of an array of hashes.
-
#api_url ⇒ Object
A URL that can be used to access the dataset using the REST API.
-
#create_table(table_id, name: nil, description: nil, schema: nil) ⇒ Object
Creates a new table.
-
#create_view(table_id, query, name: nil, description: nil) ⇒ Object
Creates a new view table from the given query.
-
#created_at ⇒ Object
The time when this dataset was created.
-
#dataset_id ⇒ Object
A unique ID for this dataset, without the project name.
-
#dataset_ref ⇒ Object
The gapi fragment containing the Project ID and Dataset ID as a camel-cased hash.
-
#default_expiration ⇒ Object
The default lifetime of all tables in the dataset, in milliseconds.
-
#default_expiration=(new_default_expiration) ⇒ Object
Updates the default lifetime of all tables in the dataset, in milliseconds.
-
#delete(force: nil) ⇒ Object
Permanently deletes the dataset.
-
#description ⇒ Object
A user-friendly description of the dataset.
-
#description=(new_description) ⇒ Object
Updates the user-friendly description of the dataset.
-
#etag ⇒ Object
A string hash of the dataset.
-
#initialize ⇒ Dataset
constructor
Create an empty Dataset object.
-
#location ⇒ Object
The geographic location where the dataset should reside.
-
#modified_at ⇒ Object
The date when this dataset or any of its tables was last modified.
-
#name ⇒ Object
A descriptive name for the dataset.
-
#name=(new_name) ⇒ Object
Updates the descriptive name for the dataset.
-
#project_id ⇒ Object
The ID of the project containing this dataset.
-
#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true) ⇒ Object
Queries data using the synchronous method.
-
#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil) ⇒ Object
Queries data using the asynchronous method.
-
#table(table_id) ⇒ Object
Retrieves an existing table by ID.
-
#tables(token: nil, max: nil) ⇒ Object
Retrieves the list of tables belonging to the dataset.
Constructor Details
#initialize ⇒ Dataset
Create an empty Dataset object.
53 54 55 56 |
# File 'lib/gcloud/bigquery/dataset.rb', line 53 def initialize #:nodoc: @connection = nil @gapi = {} end |
Instance Attribute Details
#connection ⇒ Object
The Connection object.
45 46 47 |
# File 'lib/gcloud/bigquery/dataset.rb', line 45 def connection @connection end |
#gapi ⇒ Object
The Google API Client object.
49 50 51 |
# File 'lib/gcloud/bigquery/dataset.rb', line 49 def gapi @gapi end |
Class Method Details
.from_gapi(gapi, conn) ⇒ Object
New Dataset from a Google API Client object.
734 735 736 737 738 739 |
# File 'lib/gcloud/bigquery/dataset.rb', line 734 def self.from_gapi gapi, conn #:nodoc: new.tap do |f| f.gapi = gapi f.connection = conn end end |
Instance Method Details
#access {|a2| ... } ⇒ Object
Retrieves the access rules for a Dataset using the Google Cloud Datastore API data structure of an array of hashes. The rules can be updated when passing a block, see Dataset::Access for all the methods available. See BigQuery Access Control for more information.
Examples
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
dataset.access #=> [{"role"=>"OWNER",
# "specialGroup"=>"projectOwners"},
# {"role"=>"WRITER",
# "specialGroup"=>"projectWriters"},
# {"role"=>"READER",
# "specialGroup"=>"projectReaders"},
# {"role"=>"OWNER",
# "userByEmail"=>"123456789-...com"}]
Manage the access rules by passing a block.
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
dataset.access do |access|
access.add_owner_group "[email protected]"
access.add_writer_user "[email protected]"
access.remove_writer_user "[email protected]"
access.add_reader_special :all
access.add_reader_view other_dataset_view_object
end
236 237 238 239 240 241 242 243 244 245 |
# File 'lib/gcloud/bigquery/dataset.rb', line 236 def access ensure_full_data! g = @gapi g = g.to_hash if g.respond_to? :to_hash a = g["access"] ||= [] return a unless block_given? a2 = Access.new a, dataset_ref yield a2 self.access = a2.access if a2.changed? end |
#access=(new_access) ⇒ Object
Sets the access rules for a Dataset using the Google Cloud Datastore API data structure of an array of hashes. See BigQuery Access Control for more information.
This method is provided for advanced usage of managing the access rules. Calling #access with a block is the preferred way to manage access rules.
Example
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
dataset.access = [{"role"=>"OWNER",
"specialGroup"=>"projectOwners"},
{"role"=>"WRITER",
"specialGroup"=>"projectWriters"},
{"role"=>"READER",
"specialGroup"=>"projectReaders"},
{"role"=>"OWNER",
"userByEmail"=>"123456789-...com"}]
274 275 276 |
# File 'lib/gcloud/bigquery/dataset.rb', line 274 def access= new_access patch_gapi! access: new_access end |
#api_url ⇒ Object
A URL that can be used to access the dataset using the REST API.
:category: Attributes
120 121 122 123 |
# File 'lib/gcloud/bigquery/dataset.rb', line 120 def api_url ensure_full_data! @gapi["selfLink"] end |
#create_table(table_id, name: nil, description: nil, schema: nil) ⇒ Object
Creates a new table.
Parameters
table_id-
The ID of the table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters. (
String) name-
A descriptive name for the table. (
String) description-
A user-friendly description of the table. (
String) schema-
A hash specifying fields and data types for the table. A block may be passed instead (see examples.) For the format of this hash, see the Tables resource . (
Hash)
Returns
Gcloud::Bigquery::Table
Examples
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"
You can also pass name and description options.
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"
name: "My Table",
description: "A description of my table."
You can define the table’s schema using a block.
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table" do |schema|
schema.string "first_name", mode: :required
schema.record "cities_lived", mode: :repeated do |nested_schema|
nested_schema.string "place", mode: :required
nested_schema.integer "number_of_years", mode: :required
end
end
Or, if you are adapting existing code that was written for the Rest API , you can pass the table’s schema as a hash.
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
schema = {
"fields" => [
{
"name" => "first_name",
"type" => "STRING",
"mode" => "REQUIRED"
},
{
"name" => "cities_lived",
"type" => "RECORD",
"mode" => "REPEATED",
"fields" => [
{
"name" => "place",
"type" => "STRING",
"mode" => "REQUIRED"
},
{
"name" => "number_of_years",
"type" => "INTEGER",
"mode" => "REQUIRED"
}
]
}
]
}
table = dataset.create_table "my_table", schema: schema
:category: Table
414 415 416 417 418 419 420 421 422 423 424 425 426 |
# File 'lib/gcloud/bigquery/dataset.rb', line 414 def create_table table_id, name: nil, description: nil, schema: nil ensure_connection! if block_given? if schema fail ArgumentError, "only schema block or schema option is allowed" end schema_builder = Table::Schema.new nil yield schema_builder schema = schema_builder.schema if schema_builder.changed? end = { name: name, description: description, schema: schema } insert_table table_id, end |
#create_view(table_id, query, name: nil, description: nil) ⇒ Object
Creates a new view table from the given query.
Parameters
table_id-
The ID of the view table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters. (
String) query-
The query that BigQuery executes when the view is referenced. (
String) name-
A descriptive name for the table. (
String) description-
A user-friendly description of the table. (
String)
Returns
Gcloud::Bigquery::View
Examples
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
"SELECT name, age FROM [proj:dataset.users]"
A name and description can be provided:
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
"SELECT name, age FROM [proj:dataset.users]",
name: "My View", description: "This is my view"
:category: Table
472 473 474 475 |
# File 'lib/gcloud/bigquery/dataset.rb', line 472 def create_view table_id, query, name: nil, description: nil = { query: query, name: name, description: description } insert_table table_id, end |
#created_at ⇒ Object
The time when this dataset was created.
:category: Attributes
169 170 171 172 |
# File 'lib/gcloud/bigquery/dataset.rb', line 169 def created_at ensure_full_data! Time.at(@gapi["creationTime"] / 1000.0) end |
#dataset_id ⇒ Object
A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.
:category: Attributes
65 66 67 |
# File 'lib/gcloud/bigquery/dataset.rb', line 65 def dataset_id @gapi["datasetReference"]["datasetId"] end |
#dataset_ref ⇒ Object
The gapi fragment containing the Project ID and Dataset ID as a camel-cased hash.
81 82 83 84 85 |
# File 'lib/gcloud/bigquery/dataset.rb', line 81 def dataset_ref #:nodoc: dataset_ref = @gapi["datasetReference"] dataset_ref = dataset_ref.to_hash if dataset_ref.respond_to? :to_hash dataset_ref end |
#default_expiration ⇒ Object
The default lifetime of all tables in the dataset, in milliseconds.
:category: Attributes
149 150 151 152 |
# File 'lib/gcloud/bigquery/dataset.rb', line 149 def default_expiration ensure_full_data! @gapi["defaultTableExpirationMs"] end |
#default_expiration=(new_default_expiration) ⇒ Object
Updates the default lifetime of all tables in the dataset, in milliseconds.
:category: Attributes
160 161 162 |
# File 'lib/gcloud/bigquery/dataset.rb', line 160 def default_expiration= new_default_expiration patch_gapi! default_expiration: new_default_expiration end |
#delete(force: nil) ⇒ Object
Permanently deletes the dataset. The dataset must be empty before it can be deleted unless the force option is set to true.
Parameters
force-
If
true, delete all the tables in the dataset. Iffalseand the dataset contains tables, the request will fail. Default isfalse. (Boolean)
Returns
true if the dataset was deleted.
Example
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
dataset.delete
:category: Lifecycle
305 306 307 308 309 310 311 312 313 |
# File 'lib/gcloud/bigquery/dataset.rb', line 305 def delete force: nil ensure_connection! resp = connection.delete_dataset dataset_id, force if resp.success? true else fail ApiError.from_response(resp) end end |
#description ⇒ Object
A user-friendly description of the dataset.
:category: Attributes
130 131 132 133 |
# File 'lib/gcloud/bigquery/dataset.rb', line 130 def description ensure_full_data! @gapi["description"] end |
#description=(new_description) ⇒ Object
Updates the user-friendly description of the dataset.
:category: Attributes
140 141 142 |
# File 'lib/gcloud/bigquery/dataset.rb', line 140 def description= new_description patch_gapi! description: new_description end |
#etag ⇒ Object
A string hash of the dataset.
:category: Attributes
110 111 112 113 |
# File 'lib/gcloud/bigquery/dataset.rb', line 110 def etag ensure_full_data! @gapi["etag"] end |
#location ⇒ Object
The geographic location where the dataset should reside. Possible values include EU and US. The default value is US.
:category: Attributes
190 191 192 193 |
# File 'lib/gcloud/bigquery/dataset.rb', line 190 def location ensure_full_data! @gapi["location"] end |
#modified_at ⇒ Object
The date when this dataset or any of its tables was last modified.
:category: Attributes
179 180 181 182 |
# File 'lib/gcloud/bigquery/dataset.rb', line 179 def modified_at ensure_full_data! Time.at(@gapi["lastModifiedTime"] / 1000.0) end |
#name ⇒ Object
A descriptive name for the dataset.
:category: Attributes
92 93 94 |
# File 'lib/gcloud/bigquery/dataset.rb', line 92 def name @gapi["friendlyName"] end |
#name=(new_name) ⇒ Object
Updates the descriptive name for the dataset.
:category: Attributes
101 102 103 |
# File 'lib/gcloud/bigquery/dataset.rb', line 101 def name= new_name patch_gapi! name: new_name end |
#project_id ⇒ Object
The ID of the project containing this dataset.
:category: Attributes
74 75 76 |
# File 'lib/gcloud/bigquery/dataset.rb', line 74 def project_id @gapi["datasetReference"]["projectId"] end |
#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true) ⇒ Object
Queries data using the synchronous method.
Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.
Parameters
query-
A query string, following the BigQuery query syntax, of the query to execute. Example: “SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]”. (
String) max-
The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies. (
Integer) timeout-
How long to wait for the query to complete, in milliseconds, before the request times out and returns. Note that this is only a timeout for the request, not the query. If the query takes longer to run than the timeout value, the call returns without any results and with QueryData#complete? set to false. The default value is 10000 milliseconds (10 seconds). (
Integer) dryrun-
If set to
true, BigQuery doesn’t run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value isfalse. (Boolean) cache-
Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching. (
Boolean)
Returns
Gcloud::Bigquery::QueryData
Example
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
data = bigquery.query "SELECT name FROM my_table"
data.each do |row|
puts row["name"]
end
:category: Data
719 720 721 722 723 724 725 726 727 728 729 730 |
# File 'lib/gcloud/bigquery/dataset.rb', line 719 def query query, max: nil, timeout: 10000, dryrun: nil, cache: true = { max: max, timeout: timeout, dryrun: dryrun, cache: cache } [:dataset] ||= dataset_id [:project] ||= project_id ensure_connection! resp = connection.query query, if resp.success? QueryData.from_gapi resp.data, connection else fail ApiError.from_response(resp) end end |
#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil) ⇒ Object
Queries data using the asynchronous method.
Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.
Parameters
query-
A query string, following the BigQuery query syntax, of the query to execute. Example: “SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]”. (
String) priority-
Specifies a priority for the query. Possible values include
INTERACTIVEandBATCH. The default value isINTERACTIVE. (String) cache-
Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is
true. (Boolean) table-
The destination table where the query results should be stored. If not present, a new table will be created to store the results. (
Table) create-
Specifies whether the job is allowed to create new tables. (
String)The following values are supported:
-
needed- Create the table if it does not exist. -
never- The table must already exist. A ‘notFound’ error is raised if the table does not exist.
-
write-
Specifies the action that occurs if the destination table already exists. (
String)The following values are supported:
-
truncate- BigQuery overwrites the table data. -
append- BigQuery appends the data to the table. -
empty- A ‘duplicate’ error is returned in the job result if the table exists and contains data.
-
large_results-
If
true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requirestableparameter to be set. (Boolean) flatten-
Flattens all nested and repeated fields in the query results. The default value is
true.large_resultsparameter must betrueif this is set tofalse. (Boolean)
Returns
Gcloud::Bigquery::QueryJob
Example
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
job = bigquery.query_job "SELECT name FROM my_table"
job.wait_until_done!
if !job.failed?
job.query_results.each do |row|
puts row["name"]
end
end
:category: Data
646 647 648 649 650 651 652 653 654 655 656 657 658 659 |
# File 'lib/gcloud/bigquery/dataset.rb', line 646 def query_job query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil = { priority: priority, cache: cache, table: table, create: create, write: write, large_results: large_results, flatten: flatten } [:dataset] ||= self ensure_connection! resp = connection.query_job query, if resp.success? Job.from_gapi resp.data, connection else fail ApiError.from_response(resp) end end |
#table(table_id) ⇒ Object
Retrieves an existing table by ID.
Parameters
table_id-
The ID of a table. (
String)
Returns
Gcloud::Bigquery::Table or Gcloud::Bigquery::View or nil if the table does not exist
Example
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
puts table.name
:category: Table
502 503 504 505 506 507 508 509 510 |
# File 'lib/gcloud/bigquery/dataset.rb', line 502 def table table_id ensure_connection! resp = connection.get_table dataset_id, table_id if resp.success? Table.from_gapi resp.data, connection else nil end end |
#tables(token: nil, max: nil) ⇒ Object
Retrieves the list of tables belonging to the dataset.
Parameters
token-
A previously-returned page token representing part of the larger set of results to view. (
String) max-
Maximum number of tables to return. (
Integer)
Returns
Array of Gcloud::Bigquery::Table or Gcloud::Bigquery::View (See Gcloud::Bigquery::Table::List)
Examples
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
tables = dataset.tables
tables.each do |table|
puts table.name
end
If you have a significant number of tables, you may need to paginate through them: (See Dataset::List#token)
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
all_tables = []
tmp_tables = dataset.tables
while tmp_tables.any? do
tmp_tables.each do |table|
all_tables << table
end
# break loop if no more tables available
break if tmp_tables.token.nil?
# get the next group of tables
tmp_tables = dataset.tables token: tmp_tables.token
end
:category: Table
563 564 565 566 567 568 569 570 571 572 |
# File 'lib/gcloud/bigquery/dataset.rb', line 563 def tables token: nil, max: nil ensure_connection! = { token: token, max: max } resp = connection.list_tables dataset_id, if resp.success? Table::List.from_response resp, connection else fail ApiError.from_response(resp) end end |