Class: Gcloud::Bigquery::Project
- Inherits:
-
Object
- Object
- Gcloud::Bigquery::Project
- Defined in:
- lib/gcloud/bigquery/project.rb
Overview
Project
Projects are top-level containers in Google Cloud Platform. They store information about billing and authorized users, and they contain BigQuery data. Each project has a friendly name and a unique ID.
Gcloud::Bigquery::Project is the main object for interacting with Google BigQuery. Gcloud::Bigquery::Dataset objects are created, accessed, and deleted by Gcloud::Bigquery::Project.
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
See Gcloud#bigquery
Instance Attribute Summary collapse
-
#connection ⇒ Object
The Connection object.
Class Method Summary collapse
-
.default_project ⇒ Object
Default project.
Instance Method Summary collapse
-
#create_dataset(dataset_id, options = {}) ⇒ Object
Creates a new dataset.
-
#dataset(dataset_id) ⇒ Object
Retrieves an existing dataset by ID.
-
#datasets(options = {}) ⇒ Object
Retrieves the list of datasets belonging to the project.
-
#initialize(project, credentials) ⇒ Project
constructor
Creates a new Connection instance.
-
#job(job_id) ⇒ Object
Retrieves an existing job by ID.
-
#jobs(options = {}) ⇒ Object
Retrieves the list of jobs belonging to the project.
-
#project ⇒ Object
The BigQuery project connected to.
-
#query(query, options = {}) ⇒ Object
Queries data using the synchronous method.
-
#query_job(query, options = {}) ⇒ Object
Queries data using the asynchronous method.
Constructor Details
#initialize(project, credentials) ⇒ Project
Creates a new Connection instance.
See Gcloud.bigquery
53 54 55 56 57 |
# File 'lib/gcloud/bigquery/project.rb', line 53 def initialize project, credentials project = project.to_s # Always cast to a string fail ArgumentError, "project is missing" if project.empty? @connection = Connection.new project, credentials end |
Instance Attribute Details
#connection ⇒ Object
The Connection object.
47 48 49 |
# File 'lib/gcloud/bigquery/project.rb', line 47 def connection @connection end |
Class Method Details
.default_project ⇒ Object
Default project.
77 78 79 80 81 |
# File 'lib/gcloud/bigquery/project.rb', line 77 def self.default_project #:nodoc: ENV["BIGQUERY_PROJECT"] || ENV["GCLOUD_PROJECT"] || ENV["GOOGLE_CLOUD_PROJECT"] end |
Instance Method Details
#create_dataset(dataset_id, options = {}) ⇒ Object
Creates a new dataset.
Parameters
dataset_id
-
A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters. (
String
) options
-
An optional Hash for controlling additional behavior. (
Hash
) options[:name]
-
A descriptive name for the dataset. (
String
) options[:description]
-
A user-friendly description of the dataset. (
String
) options[:expiration]
-
The default lifetime of all tables in the dataset, in milliseconds. The minimum value is 3600000 milliseconds (one hour). (
Integer
)
Returns
Gcloud::Bigquery::Dataset
Examples
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.create_dataset "my_dataset"
A name and description can be provided:
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.create_dataset "my_dataset",
name: "My Dataset",
description: "This is my Dataset"
314 315 316 317 318 319 320 321 322 |
# File 'lib/gcloud/bigquery/project.rb', line 314 def create_dataset dataset_id, = {} ensure_connection! resp = connection.insert_dataset dataset_id, if resp.success? Dataset.from_gapi resp.data, connection else fail ApiError.from_response(resp) end end |
#dataset(dataset_id) ⇒ Object
260 261 262 263 264 265 266 267 268 269 |
# File 'lib/gcloud/bigquery/project.rb', line 260 def dataset dataset_id ensure_connection! resp = connection.get_dataset dataset_id if resp.success? Dataset.from_gapi resp.data, connection else return nil if resp.status == 404 fail ApiError.from_response(resp) end end |
#datasets(options = {}) ⇒ Object
Retrieves the list of datasets belonging to the project.
Parameters
options
-
An optional Hash for controlling additional behavior. (
Hash
) options[:all]
-
Whether to list all datasets, including hidden ones. The default is
false
. (Boolean
) options[:token]
-
A previously-returned page token representing part of the larger set of results to view. (
String
) options[:max]
-
Maximum number of datasets to return. (
Integer
)
Returns
Array of Gcloud::Bigquery::Dataset (Gcloud::Bigquery::Dataset::List)
Examples
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
datasets = bigquery.datasets
datasets.each do |dataset|
puts dataset.name
end
You can also retrieve all datasets, including hidden ones, by providing the :all
option:
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
all_datasets = bigquery.datasets, all: true
If you have a significant number of datasets, you may need to paginate through them: (See Dataset::List#token)
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
all_datasets = []
tmp_datasets = bigquery.datasets
while tmp_datasets.any? do
tmp_datasets.each do |dataset|
all_datasets << dataset
end
# break loop if no more datasets available
break if tmp_datasets.token.nil?
# get the next group of datasets
tmp_datasets = bigquery.datasets token: tmp_datasets.token
end
386 387 388 389 390 391 392 393 394 |
# File 'lib/gcloud/bigquery/project.rb', line 386 def datasets = {} ensure_connection! resp = connection.list_datasets if resp.success? Dataset::List.from_resp resp, connection else fail ApiError.from_response(resp) end end |
#job(job_id) ⇒ Object
417 418 419 420 421 422 423 424 425 426 |
# File 'lib/gcloud/bigquery/project.rb', line 417 def job job_id ensure_connection! resp = connection.get_job job_id if resp.success? Job.from_gapi resp.data, connection else return nil if resp.status == 404 fail ApiError.from_response(resp) end end |
#jobs(options = {}) ⇒ Object
Retrieves the list of jobs belonging to the project.
Parameters
options
-
An optional Hash for controlling additional behavior. (
Hash
) options[:all]
-
Whether to display jobs owned by all users in the project. The default is
false
. (Boolean
) options[:token]
-
A previously-returned page token representing part of the larger set of results to view. (
String
) options[:max]
-
Maximum number of jobs to return. (
Integer
) options[:filter]
-
A filter for job state. (
String
)Acceptable values are:
-
done
- Finished jobs -
pending
- Pending jobs -
running
- Running jobs
-
Returns
Array of Gcloud::Bigquery::Job (Gcloud::Bigquery::Job::List)
Examples
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
jobs = bigquery.jobs
You can also retrieve only running jobs using the :filter
option:
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
running_jobs = bigquery.jobs filter: "running"
If you have a significant number of jobs, you may need to paginate through them: (See Job::List#token)
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
all_jobs = []
tmp_jobs = bigquery.jobs
while tmp_jobs.any? do
tmp_jobs.each do |job|
all_jobs << job
end
# break loop if no more jobs available
break if tmp_jobs.token.nil?
# get the next group of jobs
tmp_jobs = bigquery.jobs token: tmp_jobs.token
end
493 494 495 496 497 498 499 500 501 |
# File 'lib/gcloud/bigquery/project.rb', line 493 def jobs = {} ensure_connection! resp = connection.list_jobs if resp.success? Job::List.from_resp resp, connection else fail ApiError.from_response(resp) end end |
#project ⇒ Object
71 72 73 |
# File 'lib/gcloud/bigquery/project.rb', line 71 def project connection.project end |
#query(query, options = {}) ⇒ Object
Queries data using the synchronous method.
Parameters
query
-
A query string, following the BigQuery query syntax, of the query to execute. Example: “SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]”. (
String
) options[:max]
-
The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies. (
Integer
) options[:timeout]
-
How long to wait for the query to complete, in milliseconds, before the request times out and returns. Note that this is only a timeout for the request, not the query. If the query takes longer to run than the timeout value, the call returns without any results and with QueryData#complete? set to false. The default value is 10000 milliseconds (10 seconds). (
Integer
) options[:dryrun]
-
If set to
true
, BigQuery doesn’t run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value isfalse
. (Boolean
) options[:cache]
-
Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching. (
Boolean
) options[:dataset]
-
Specifies the default datasetId and projectId to assume for any unqualified table names in the query. If not set, all table names in the query string must be qualified in the format ‘datasetId.tableId’. (
String
) options[:project]
-
Specifies the default projectId to assume for any unqualified table names in the query. Only used if
dataset
option is set. (String
)
Returns
Gcloud::Bigquery::QueryData
Example
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
data = bigquery.query "SELECT name FROM [my_proj:my_data.my_table]"
data.each do |row|
puts row["name"]
end
228 229 230 231 232 233 234 235 236 |
# File 'lib/gcloud/bigquery/project.rb', line 228 def query query, = {} ensure_connection! resp = connection.query query, if resp.success? QueryData.from_gapi resp.data, connection else fail ApiError.from_response(resp) end end |
#query_job(query, options = {}) ⇒ Object
Queries data using the asynchronous method.
Parameters
query
-
A query string, following the BigQuery query syntax, of the query to execute. Example: “SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]”. (
String
) options[:priority]
-
Specifies a priority for the query. Possible values include
INTERACTIVE
andBATCH
. The default value isINTERACTIVE
. (String
) options[:cache]
-
Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is
true
. (Boolean
) options[:table]
-
The destination table where the query results should be stored. If not present, a new table will be created to store the results. (
Table
) options[:create]
-
Specifies whether the job is allowed to create new tables. (
String
)The following values are supported:
-
needed
- Create the table if it does not exist. -
never
- The table must already exist. A ‘notFound’ error is raised if the table does not exist.
-
options[:write]
-
Specifies the action that occurs if the destination table already exists. (
String
)The following values are supported:
-
truncate
- BigQuery overwrites the table data. -
append
- BigQuery appends the data to the table. -
empty
- A ‘duplicate’ error is returned in the job result if the table exists and contains data.
-
options[:large_results]
-
If
true
, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requiresoptions[:table]
to be set. (Boolean
) options[:flatten]
-
Flattens all nested and repeated fields in the query results. The default value is
true
.options[:large_results]
must betrue
if this is set tofalse
. (Boolean
) options[:dataset]
-
Specifies the default dataset to use for unqualified table names in the query. (
Dataset
orString
)
Returns
Gcloud::Bigquery::QueryJob
Example
require "gcloud"
gcloud = Gcloud.new
bigquery = gcloud.bigquery
job = bigquery.query_job "SELECT name FROM [my_proj:my_data.my_table]"
loop do
break if job.done?
sleep 1
job.refresh!
end
if !job.failed?
job.query_results.each do |row|
puts row["name"]
end
end
157 158 159 160 161 162 163 164 165 |
# File 'lib/gcloud/bigquery/project.rb', line 157 def query_job query, = {} ensure_connection! resp = connection.query_job query, if resp.success? Job.from_gapi resp.data, connection else fail ApiError.from_response(resp) end end |