Class: Gcloud::Bigquery::Project
- Inherits:
-
Object
- Object
- Gcloud::Bigquery::Project
- Defined in:
- lib/gcloud/bigquery/project.rb
Overview
Project
Projects are top-level containers in Google Cloud Platform. They store information about billing and authorized users, and they contain BigQuery data. Each project has a friendly name and a unique ID.
Gcloud::Bigquery::Project is the main object for interacting with Google BigQuery. Gcloud::Bigquery::Dataset objects are created, accessed, and deleted by Gcloud::Bigquery::Project.
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
See Gcloud#bigquery
Instance Attribute Summary collapse
-
#connection ⇒ Object
The Connection object.
Class Method Summary collapse
-
.default_project ⇒ Object
Default project.
Instance Method Summary collapse
-
#create_dataset(dataset_id, name: nil, description: nil, expiration: nil, access: nil) ⇒ Object
Creates a new dataset.
-
#dataset(dataset_id) ⇒ Object
Retrieves an existing dataset by ID.
-
#datasets(all: nil, token: nil, max: nil) ⇒ Object
Retrieves the list of datasets belonging to the project.
-
#initialize(project, credentials) ⇒ Project
constructor
Creates a new Connection instance.
-
#job(job_id) ⇒ Object
Retrieves an existing job by ID.
-
#jobs(all: nil, token: nil, max: nil, filter: nil) ⇒ Object
Retrieves the list of jobs belonging to the project.
-
#project ⇒ Object
The BigQuery project connected to.
-
#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true, dataset: nil, project: nil) ⇒ Object
Queries data using the synchronous method.
-
#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil, dataset: nil) ⇒ Object
Queries data using the asynchronous method.
Constructor Details
#initialize(project, credentials) ⇒ Project
Creates a new Connection instance.
See Gcloud.bigquery
54 55 56 57 58 |
# File 'lib/gcloud/bigquery/project.rb', line 54 def initialize project, credentials project = project.to_s # Always cast to a string fail ArgumentError, "project is missing" if project.empty? @connection = Connection.new project, credentials end |
Instance Attribute Details
#connection ⇒ Object
The Connection object.
48 49 50 |
# File 'lib/gcloud/bigquery/project.rb', line 48 def connection @connection end |
Class Method Details
.default_project ⇒ Object
Default project.
78 79 80 81 82 83 |
# File 'lib/gcloud/bigquery/project.rb', line 78 def self.default_project #:nodoc: ENV["BIGQUERY_PROJECT"] || ENV["GCLOUD_PROJECT"] || ENV["GOOGLE_CLOUD_PROJECT"] || Gcloud::GCE.project_id end |
Instance Method Details
#create_dataset(dataset_id, name: nil, description: nil, expiration: nil, access: nil) ⇒ Object
Creates a new dataset.
Parameters
dataset_id-
A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters. (
String) name-
A descriptive name for the dataset. (
String) description-
A user-friendly description of the dataset. (
String) expiration-
The default lifetime of all tables in the dataset, in milliseconds. The minimum value is 3600000 milliseconds (one hour). (
Integer) access-
The access rules for a Dataset using the Google Cloud Datastore API data structure of an array of hashes. See BigQuery Access Control for more information. (+Array of Hashes+)
Returns
Gcloud::Bigquery::Dataset
Examples
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.create_dataset "my_dataset"
A name and description can be provided:
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.create_dataset "my_dataset",
name: "My Dataset",
description: "This is my Dataset"
Access rules can be provided with the access option:
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.create_dataset "my_dataset",
access: [{"role"=>"WRITER", "userByEmail"=>"[email protected]"}]
Or access rules can be configured by using the block syntax: (See Dataset::Access)
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.create_dataset "my_dataset" do |access|
access.add_writer_user "[email protected]"
end
345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 |
# File 'lib/gcloud/bigquery/project.rb', line 345 def create_dataset dataset_id, name: nil, description: nil, expiration: nil, access: nil if block_given? access_builder = Dataset::Access.new connection.default_access_rules, "projectId" => project yield access_builder access = access_builder.access if access_builder.changed? end ensure_connection! = { name: name, description: description, expiration: expiration, access: access } resp = connection.insert_dataset dataset_id, return Dataset.from_gapi(resp.data, connection) if resp.success? fail ApiError.from_response(resp) end |
#dataset(dataset_id) ⇒ Object
266 267 268 269 270 271 272 273 274 275 |
# File 'lib/gcloud/bigquery/project.rb', line 266 def dataset dataset_id ensure_connection! resp = connection.get_dataset dataset_id if resp.success? Dataset.from_gapi resp.data, connection else return nil if resp.status == 404 fail ApiError.from_response(resp) end end |
#datasets(all: nil, token: nil, max: nil) ⇒ Object
Retrieves the list of datasets belonging to the project.
Parameters
all-
Whether to list all datasets, including hidden ones. The default is
false. (Boolean) token-
A previously-returned page token representing part of the larger set of results to view. (
String) max-
Maximum number of datasets to return. (
Integer)
Returns
Array of Gcloud::Bigquery::Dataset (See Gcloud::Bigquery::Dataset::List)
Examples
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
datasets = bigquery.datasets
datasets.each do |dataset|
puts dataset.name
end
You can also retrieve all datasets, including hidden ones, by providing the :all option:
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
all_datasets = bigquery.datasets, all: true
If you have a significant number of datasets, you may need to paginate through them: (See Dataset::List#token)
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
all_datasets = []
tmp_datasets = bigquery.datasets
while tmp_datasets.any? do
tmp_datasets.each do |dataset|
all_datasets << dataset
end
# break loop if no more datasets available
break if tmp_datasets.token.nil?
# get the next group of datasets
tmp_datasets = bigquery.datasets token: tmp_datasets.token
end
422 423 424 425 426 427 428 429 430 431 |
# File 'lib/gcloud/bigquery/project.rb', line 422 def datasets all: nil, token: nil, max: nil ensure_connection! = { all: all, token: token, max: max } resp = connection.list_datasets if resp.success? Dataset::List.from_response resp, connection else fail ApiError.from_response(resp) end end |
#job(job_id) ⇒ Object
454 455 456 457 458 459 460 461 462 463 |
# File 'lib/gcloud/bigquery/project.rb', line 454 def job job_id ensure_connection! resp = connection.get_job job_id if resp.success? Job.from_gapi resp.data, connection else return nil if resp.status == 404 fail ApiError.from_response(resp) end end |
#jobs(all: nil, token: nil, max: nil, filter: nil) ⇒ Object
Retrieves the list of jobs belonging to the project.
Parameters
all-
Whether to display jobs owned by all users in the project. The default is
false. (Boolean) token-
A previously-returned page token representing part of the larger set of results to view. (
String) max-
Maximum number of jobs to return. (
Integer) filter-
A filter for job state. (
String)Acceptable values are:
-
done- Finished jobs -
pending- Pending jobs -
running- Running jobs
-
Returns
Array of Gcloud::Bigquery::Job (See Gcloud::Bigquery::Job::List)
Examples
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
jobs = bigquery.jobs
You can also retrieve only running jobs using the :filter option:
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
running_jobs = bigquery.jobs filter: "running"
If you have a significant number of jobs, you may need to paginate through them: (See Job::List#token)
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
all_jobs = []
tmp_jobs = bigquery.jobs
while tmp_jobs.any? do
tmp_jobs.each do |job|
all_jobs << job
end
# break loop if no more jobs available
break if tmp_jobs.token.nil?
# get the next group of jobs
tmp_jobs = bigquery.jobs token: tmp_jobs.token
end
528 529 530 531 532 533 534 535 536 537 |
# File 'lib/gcloud/bigquery/project.rb', line 528 def jobs all: nil, token: nil, max: nil, filter: nil ensure_connection! = { all: all, token: token, max: max, filter: filter } resp = connection.list_jobs if resp.success? Job::List.from_response resp, connection else fail ApiError.from_response(resp) end end |
#project ⇒ Object
72 73 74 |
# File 'lib/gcloud/bigquery/project.rb', line 72 def project connection.project end |
#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true, dataset: nil, project: nil) ⇒ Object
Queries data using the synchronous method.
Parameters
query-
A query string, following the BigQuery query syntax, of the query to execute. Example: “SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]”. (
String) max-
The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies. (
Integer) timeout-
How long to wait for the query to complete, in milliseconds, before the request times out and returns. Note that this is only a timeout for the request, not the query. If the query takes longer to run than the timeout value, the call returns without any results and with QueryData#complete? set to false. The default value is 10000 milliseconds (10 seconds). (
Integer) dryrun-
If set to
true, BigQuery doesn’t run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value isfalse. (Boolean) cache-
Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching. (
Boolean) dataset-
Specifies the default datasetId and projectId to assume for any unqualified table names in the query. If not set, all table names in the query string must be qualified in the format ‘datasetId.tableId’. (
String) project-
Specifies the default projectId to assume for any unqualified table names in the query. Only used if
datasetoption is set. (String)
Returns
Gcloud::Bigquery::QueryData
Example
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
data = bigquery.query "SELECT name FROM [my_proj:my_data.my_table]"
data.each do |row|
puts row["name"]
end
231 232 233 234 235 236 237 238 239 240 241 242 |
# File 'lib/gcloud/bigquery/project.rb', line 231 def query query, max: nil, timeout: 10000, dryrun: nil, cache: true, dataset: nil, project: nil ensure_connection! = { max: max, timeout: timeout, dryrun: dryrun, cache: cache, dataset: dataset, project: project } resp = connection.query query, if resp.success? QueryData.from_gapi resp.data, connection else fail ApiError.from_response(resp) end end |
#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil, dataset: nil) ⇒ Object
Queries data using the asynchronous method.
Parameters
query-
A query string, following the BigQuery query syntax, of the query to execute. Example: “SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]”. (
String) priority-
Specifies a priority for the query. Possible values include
INTERACTIVEandBATCH. The default value isINTERACTIVE. (String) cache-
Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is
true. (Boolean) table-
The destination table where the query results should be stored. If not present, a new table will be created to store the results. (
Table) create-
Specifies whether the job is allowed to create new tables. (
String)The following values are supported:
-
needed- Create the table if it does not exist. -
never- The table must already exist. A ‘notFound’ error is raised if the table does not exist.
-
write-
Specifies the action that occurs if the destination table already exists. (
String)The following values are supported:
-
truncate- BigQuery overwrites the table data. -
append- BigQuery appends the data to the table. -
empty- A ‘duplicate’ error is returned in the job result if the table exists and contains data.
-
large_results-
If
true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requirestableparameter to be set. (Boolean) flatten-
Flattens all nested and repeated fields in the query results. The default value is
true.large_resultsparameter must betrueif this is set tofalse. (Boolean) dataset-
Specifies the default dataset to use for unqualified table names in the query. (
DatasetorString)
Returns
Gcloud::Bigquery::QueryJob
Example
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
job = bigquery.query_job "SELECT name FROM [my_proj:my_data.my_table]"
job.wait_until_done!
if !job.failed?
job.query_results.each do |row|
puts row["name"]
end
end
155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
# File 'lib/gcloud/bigquery/project.rb', line 155 def query_job query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil, dataset: nil ensure_connection! = { priority: priority, cache: cache, table: table, create: create, write: write, large_results: large_results, flatten: flatten, dataset: dataset } resp = connection.query_job query, if resp.success? Job.from_gapi resp.data, connection else fail ApiError.from_response(resp) end end |