Class: Gcloud::Bigquery::Dataset
- Inherits:
-
Object
- Object
- Gcloud::Bigquery::Dataset
- Defined in:
- lib/gcloud/bigquery/dataset.rb,
lib/gcloud/bigquery/dataset/list.rb
Overview
Dataset
Represents a Dataset. A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. A dataset is contained within a specific project.
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.create_dataset "my_dataset",
name: "My Dataset"
description: "This is my Dataset"
Defined Under Namespace
Classes: List
Instance Attribute Summary collapse
-
#connection ⇒ Object
The Connection object.
-
#gapi ⇒ Object
The Google API Client object.
Class Method Summary collapse
-
.from_gapi(gapi, conn) ⇒ Object
New Dataset from a Google API Client object.
Instance Method Summary collapse
-
#create_table(table_id, options = {}) ⇒ Object
Creates a new table.
-
#create_view(table_id, query, options = {}) ⇒ Object
Creates a new view table from the given query.
-
#created_at ⇒ Object
The time when this dataset was created.
-
#dataset_id ⇒ Object
A unique ID for this dataset, without the project name.
-
#default_expiration ⇒ Object
The default lifetime of all tables in the dataset, in milliseconds.
-
#default_expiration=(new_default_expiration) ⇒ Object
Updates the default lifetime of all tables in the dataset, in milliseconds.
-
#delete(options = {}) ⇒ Object
Permanently deletes the dataset.
-
#description ⇒ Object
A user-friendly description of the dataset.
-
#description=(new_description) ⇒ Object
Updates the user-friendly description of the dataset.
-
#etag ⇒ Object
A string hash of the dataset.
-
#initialize ⇒ Dataset
constructor
Create an empty Dataset object.
-
#location ⇒ Object
The geographic location where the dataset should reside.
-
#modified_at ⇒ Object
The date when this dataset or any of its tables was last modified.
-
#name ⇒ Object
A descriptive name for the dataset.
-
#name=(new_name) ⇒ Object
Updates the descriptive name for the dataset.
-
#project_id ⇒ Object
The ID of the project containing this dataset.
-
#query(query, options = {}) ⇒ Object
Queries data using the synchronous method.
-
#query_job(query, options = {}) ⇒ Object
Queries data using the asynchronous method.
-
#table(table_id) ⇒ Object
Retrieves an existing table by ID.
-
#tables(options = {}) ⇒ Object
Retrieves the list of tables belonging to the dataset.
-
#url ⇒ Object
A URL that can be used to access the dataset using the REST API.
Constructor Details
#initialize ⇒ Dataset
Create an empty Dataset object.
51 52 53 54 |
# File 'lib/gcloud/bigquery/dataset.rb', line 51 def initialize #:nodoc: @connection = nil @gapi = {} end |
Instance Attribute Details
#connection ⇒ Object
The Connection object.
43 44 45 |
# File 'lib/gcloud/bigquery/dataset.rb', line 43 def connection @connection end |
#gapi ⇒ Object
The Google API Client object.
47 48 49 |
# File 'lib/gcloud/bigquery/dataset.rb', line 47 def gapi @gapi end |
Class Method Details
.from_gapi(gapi, conn) ⇒ Object
New Dataset from a Google API Client object.
615 616 617 618 619 620 |
# File 'lib/gcloud/bigquery/dataset.rb', line 615 def self.from_gapi gapi, conn #:nodoc: new.tap do |f| f.gapi = gapi f.connection = conn end end |
Instance Method Details
#create_table(table_id, options = {}) ⇒ Object
Creates a new table.
Parameters
table_id-
The ID of the table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters. (
String) options-
An optional Hash for controlling additional behavior. (
Hash) options[:name]-
A descriptive name for the table. (
String) options[:description]-
A user-friendly description of the table. (
String) options[:schema]-
A schema specifying fields and data types for the table. See the Tables resource for more information. (
Hash)
Returns
Gcloud::Bigquery::Table
Examples
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"
A name and description can be provided:
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
schema = {
"fields" => [
{
"name" => "first_name",
"type" => "STRING",
"mode" => "REQUIRED"
},
{
"name" => "cities_lived",
"type" => "RECORD",
"mode" => "REPEATED",
"fields" => [
{
"name" => "place",
"type" => "STRING",
"mode" => "REQUIRED"
},
{
"name" => "number_of_years",
"type" => "INTEGER",
"mode" => "REQUIRED"
}
]
}
]
}
table = dataset.create_table "my_table",
name: "My Table",
schema: schema
:category: Table
297 298 299 300 301 302 303 304 305 |
# File 'lib/gcloud/bigquery/dataset.rb', line 297 def create_table table_id, = {} ensure_connection! resp = connection.insert_table dataset_id, table_id, if resp.success? Table.from_gapi resp.data, connection else fail ApiError.from_response(resp) end end |
#create_view(table_id, query, options = {}) ⇒ Object
Creates a new view table from the given query.
Parameters
table_id-
The ID of the view table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters. (
String) query-
The query that BigQuery executes when the view is referenced. (
String) options-
An optional Hash for controlling additional behavior. (
Hash) options[:name]-
A descriptive name for the table. (
String) options[:description]-
A user-friendly description of the table. (
String)
Returns
Gcloud::Bigquery::View
Examples
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
"SELECT name, age FROM [proj:dataset.users]"
A name and description can be provided:
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
"SELECT name, age FROM [proj:dataset.users]",
name: "My View", description: "This is my view"
:category: Table
353 354 355 356 |
# File 'lib/gcloud/bigquery/dataset.rb', line 353 def create_view table_id, query, = {} [:query] = query create_table table_id, end |
#created_at ⇒ Object
The time when this dataset was created.
:category: Attributes
158 159 160 161 |
# File 'lib/gcloud/bigquery/dataset.rb', line 158 def created_at ensure_full_data! Time.at(@gapi["creationTime"] / 1000.0) end |
#dataset_id ⇒ Object
A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.
:category: Attributes
63 64 65 |
# File 'lib/gcloud/bigquery/dataset.rb', line 63 def dataset_id @gapi["datasetReference"]["datasetId"] end |
#default_expiration ⇒ Object
The default lifetime of all tables in the dataset, in milliseconds.
:category: Attributes
138 139 140 141 |
# File 'lib/gcloud/bigquery/dataset.rb', line 138 def default_expiration ensure_full_data! @gapi["defaultTableExpirationMs"] end |
#default_expiration=(new_default_expiration) ⇒ Object
Updates the default lifetime of all tables in the dataset, in milliseconds.
:category: Attributes
149 150 151 |
# File 'lib/gcloud/bigquery/dataset.rb', line 149 def default_expiration= new_default_expiration patch_gapi! default_expiration: new_default_expiration end |
#delete(options = {}) ⇒ Object
Permanently deletes the dataset. The dataset must be empty before it can be deleted unless the force option is set to true.
Parameters
options-
An optional Hash for controlling additional behavior. (
Hash) options[:force]-
If
true, delete all the tables in the dataset. Iffalseand the dataset contains tables, the request will fail. Default isfalse. (Boolean)
Returns
true if the dataset was deleted.
Example
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
dataset.delete
:category: Lifecycle
213 214 215 216 217 218 219 220 221 |
# File 'lib/gcloud/bigquery/dataset.rb', line 213 def delete = {} ensure_connection! resp = connection.delete_dataset dataset_id, if resp.success? true else fail ApiError.from_response(resp) end end |
#description ⇒ Object
A user-friendly description of the dataset.
:category: Attributes
119 120 121 122 |
# File 'lib/gcloud/bigquery/dataset.rb', line 119 def description ensure_full_data! @gapi["description"] end |
#description=(new_description) ⇒ Object
Updates the user-friendly description of the dataset.
:category: Attributes
129 130 131 |
# File 'lib/gcloud/bigquery/dataset.rb', line 129 def description= new_description patch_gapi! description: new_description end |
#etag ⇒ Object
A string hash of the dataset.
:category: Attributes
99 100 101 102 |
# File 'lib/gcloud/bigquery/dataset.rb', line 99 def etag ensure_full_data! @gapi["etag"] end |
#location ⇒ Object
The geographic location where the dataset should reside. Possible values include EU and US. The default value is US.
:category: Attributes
179 180 181 182 |
# File 'lib/gcloud/bigquery/dataset.rb', line 179 def location ensure_full_data! @gapi["location"] end |
#modified_at ⇒ Object
The date when this dataset or any of its tables was last modified.
:category: Attributes
168 169 170 171 |
# File 'lib/gcloud/bigquery/dataset.rb', line 168 def modified_at ensure_full_data! Time.at(@gapi["lastModifiedTime"] / 1000.0) end |
#name ⇒ Object
A descriptive name for the dataset.
:category: Attributes
81 82 83 |
# File 'lib/gcloud/bigquery/dataset.rb', line 81 def name @gapi["friendlyName"] end |
#name=(new_name) ⇒ Object
Updates the descriptive name for the dataset.
:category: Attributes
90 91 92 |
# File 'lib/gcloud/bigquery/dataset.rb', line 90 def name= new_name patch_gapi! name: new_name end |
#project_id ⇒ Object
The ID of the project containing this dataset.
:category: Attributes
72 73 74 |
# File 'lib/gcloud/bigquery/dataset.rb', line 72 def project_id @gapi["datasetReference"]["projectId"] end |
#query(query, options = {}) ⇒ Object
Queries data using the synchronous method.
Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.
Parameters
query-
A query string, following the BigQuery query syntax, of the query to execute. Example: “SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]”. (
String) options[:max]-
The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies. (
Integer) options[:timeout]-
How long to wait for the query to complete, in milliseconds, before the request times out and returns. Note that this is only a timeout for the request, not the query. If the query takes longer to run than the timeout value, the call returns without any results and with QueryData#complete? set to false. The default value is 10000 milliseconds (10 seconds). (
Integer) options[:dryrun]-
If set to
true, BigQuery doesn’t run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value isfalse. (Boolean) options[:cache]-
Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching. (
Boolean)
Returns
Gcloud::Bigquery::QueryData
Example
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
data = bigquery.query "SELECT name FROM my_table"
data.each do |row|
puts row["name"]
end
:category: Data
601 602 603 604 605 606 607 608 609 610 611 |
# File 'lib/gcloud/bigquery/dataset.rb', line 601 def query query, = {} [:dataset] ||= dataset_id [:project] ||= project_id ensure_connection! resp = connection.query query, if resp.success? QueryData.from_gapi resp.data, connection else fail ApiError.from_response(resp) end end |
#query_job(query, options = {}) ⇒ Object
Queries data using the asynchronous method.
Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.
Parameters
query-
A query string, following the BigQuery query syntax, of the query to execute. Example: “SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]”. (
String) options[:priority]-
Specifies a priority for the query. Possible values include
INTERACTIVEandBATCH. The default value isINTERACTIVE. (String) options[:cache]-
Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is
true. (Boolean) options[:table]-
The destination table where the query results should be stored. If not present, a new table will be created to store the results. (
Table) options[:create]-
Specifies whether the job is allowed to create new tables. (
String)The following values are supported:
-
needed- Create the table if it does not exist. -
never- The table must already exist. A ‘notFound’ error is raised if the table does not exist.
-
options[:write]-
Specifies the action that occurs if the destination table already exists. (
String)The following values are supported:
-
truncate- BigQuery overwrites the table data. -
append- BigQuery appends the data to the table. -
empty- A ‘duplicate’ error is returned in the job result if the table exists and contains data.
-
options[:large_results]-
If
true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requiresoptions[:table]to be set. (Boolean) options[:flatten]-
Flattens all nested and repeated fields in the query results. The default value is
true.options[:large_results]must betrueif this is set tofalse. (Boolean)
Returns
Gcloud::Bigquery::QueryJob
Example
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
job = bigquery.query_job "SELECT name FROM my_table"
loop do
break if job.done?
sleep 1
job.refresh!
end
if !job.failed?
job.query_results.each do |row|
puts row["name"]
end
end
:category: Data
532 533 534 535 536 537 538 539 540 541 |
# File 'lib/gcloud/bigquery/dataset.rb', line 532 def query_job query, = {} [:dataset] ||= self ensure_connection! resp = connection.query_job query, if resp.success? Job.from_gapi resp.data, connection else fail ApiError.from_response(resp) end end |
#table(table_id) ⇒ Object
Retrieves an existing table by ID.
Parameters
table_id-
The ID of a table. (
String)
Returns
Gcloud::Bigquery::Table or Gcloud::Bigquery::View or nil if the table does not exist
Example
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
puts table.name
:category: Table
383 384 385 386 387 388 389 390 391 |
# File 'lib/gcloud/bigquery/dataset.rb', line 383 def table table_id ensure_connection! resp = connection.get_table dataset_id, table_id if resp.success? Table.from_gapi resp.data, connection else nil end end |
#tables(options = {}) ⇒ Object
Retrieves the list of tables belonging to the dataset.
Parameters
options-
An optional Hash for controlling additional behavior. (
Hash) options[:token]-
A previously-returned page token representing part of the larger set of results to view. (
String) options[:max]-
Maximum number of tables to return. (
Integer)
Returns
Array of Gcloud::Bigquery::Table or Gcloud::Bigquery::View (Gcloud::Bigquery::Table::List)
Examples
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
tables = dataset.tables
tables.each do |table|
puts table.name
end
If you have a significant number of tables, you may need to paginate through them: (See Dataset::List#token)
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
all_tables = []
tmp_tables = dataset.tables
while tmp_tables.any? do
tmp_tables.each do |table|
all_tables << table
end
# break loop if no more tables available
break if tmp_tables.token.nil?
# get the next group of tables
tmp_tables = dataset.tables token: tmp_tables.token
end
:category: Table
446 447 448 449 450 451 452 453 454 |
# File 'lib/gcloud/bigquery/dataset.rb', line 446 def tables = {} ensure_connection! resp = connection.list_tables dataset_id, if resp.success? Table::List.from_resp resp, connection else fail ApiError.from_response(resp) end end |
#url ⇒ Object
A URL that can be used to access the dataset using the REST API.
:category: Attributes
109 110 111 112 |
# File 'lib/gcloud/bigquery/dataset.rb', line 109 def url ensure_full_data! @gapi["selfLink"] end |