Class: Gcloud::Bigquery::Project

Inherits:

Object

Object
Gcloud::Bigquery::Project

show all

Defined in:: lib/gcloud/bigquery/project.rb

Overview

Project

Projects are top-level containers in Google Cloud Platform. They store information about billing and authorized users, and they contain BigQuery data. Each project has a friendly name and a unique ID.

Gcloud::Bigquery::Project is the main object for interacting with Google BigQuery. Gcloud::Bigquery::Dataset objects are created, accessed, and deleted by Gcloud::Bigquery::Project.

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

See Gcloud#bigquery

Instance Attribute Summary collapse

#connection ⇒ Object

The Connection object.

Class Method Summary collapse

.default_project ⇒ Object

Default project.

Instance Method Summary collapse

#create_dataset(dataset_id, name: nil, description: nil, expiration: nil, access: nil) ⇒ Object

Creates a new dataset.
#dataset(dataset_id) ⇒ Object

Retrieves an existing dataset by ID.
#datasets(all: nil, token: nil, max: nil) ⇒ Object

Retrieves the list of datasets belonging to the project.
#initialize(project, credentials) ⇒ Project constructor

Creates a new Connection instance.
#job(job_id) ⇒ Object

Retrieves an existing job by ID.
#jobs(all: nil, token: nil, max: nil, filter: nil) ⇒ Object

Retrieves the list of jobs belonging to the project.
#project ⇒ Object

The BigQuery project connected to.
#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true, dataset: nil, project: nil) ⇒ Object

Queries data using the synchronous method.
#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil, dataset: nil) ⇒ Object

Queries data using the asynchronous method.

Constructor Details

#initialize(project, credentials) ⇒ `Project`

Creates a new Connection instance.

See Gcloud.bigquery

# File 'lib/gcloud/bigquery/project.rb', line 54

def initialize project, credentials
  project = project.to_s # Always cast to a string
  fail ArgumentError, "project is missing" if project.empty?
  @connection = Connection.new project, credentials
end

Instance Attribute Details

#connection ⇒ `Object`

The Connection object.



48
49
50

# File 'lib/gcloud/bigquery/project.rb', line 48

def connection
  @connection
end

Class Method Details

.default_project ⇒ `Object`

Default project.

# File 'lib/gcloud/bigquery/project.rb', line 78

def self.default_project #:nodoc:
  ENV["BIGQUERY_PROJECT"] ||
    ENV["GCLOUD_PROJECT"] ||
    ENV["GOOGLE_CLOUD_PROJECT"] ||
    Gcloud::GCE.project_id
end

Instance Method Details

#create_dataset(dataset_id, name: nil, description: nil, expiration: nil, access: nil) ⇒ `Object`

Creates a new dataset.

Parameters

dataset_id: A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters. (String)
name: A descriptive name for the dataset. (String)
description: A user-friendly description of the dataset. (String)
expiration: The default lifetime of all tables in the dataset, in milliseconds. The minimum value is 3600000 milliseconds (one hour). (Integer)
access: The access rules for a Dataset using the Google Cloud Datastore API data structure of an array of hashes. See BigQuery Access Control for more information. (+Array of Hashes+)

Returns

Gcloud::Bigquery::Dataset

Examples

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

dataset = bigquery.create_dataset "my_dataset"

A name and description can be provided:

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

dataset = bigquery.create_dataset "my_dataset",
                                  name: "My Dataset",
                                  description: "This is my Dataset"

Access rules can be provided with the access option:

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

dataset = bigquery.create_dataset "my_dataset",
  access: [{"role"=>"WRITER", "userByEmail"=>"[email protected]"}]

Or access rules can be configured by using the block syntax: (See Dataset::Access)

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

dataset = bigquery.create_dataset "my_dataset" do |access|
  access.add_writer_user "[email protected]"
end

# File 'lib/gcloud/bigquery/project.rb', line 345

def create_dataset dataset_id, name: nil, description: nil,
                   expiration: nil, access: nil
  if block_given?
    access_builder = Dataset::Access.new connection.default_access_rules,
                                         "projectId" => project
    yield access_builder
    access = access_builder.access if access_builder.changed?
  end

  ensure_connection!
  options = { name: name, description: description,
              expiration: expiration, access: access }
  resp = connection.insert_dataset dataset_id, options
  return Dataset.from_gapi(resp.data, connection) if resp.success?
  fail ApiError.from_response(resp)
end

#dataset(dataset_id) ⇒ `Object`

Retrieves an existing dataset by ID.

Parameters

dataset_id: The ID of a dataset. (String)

Returns

Gcloud::Bigquery::Dataset or nil if dataset does not exist

Example

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

dataset = bigquery.dataset "my_dataset"
puts dataset.name

# File 'lib/gcloud/bigquery/project.rb', line 266

def dataset dataset_id
  ensure_connection!
  resp = connection.get_dataset dataset_id
  if resp.success?
    Dataset.from_gapi resp.data, connection
  else
    return nil if resp.status == 404
    fail ApiError.from_response(resp)
  end
end

#datasets(all: nil, token: nil, max: nil) ⇒ `Object`

Retrieves the list of datasets belonging to the project.

Parameters

all: Whether to list all datasets, including hidden ones. The default is false. (Boolean)
token: A previously-returned page token representing part of the larger set of results to view. (String)
max: Maximum number of datasets to return. (Integer)

Returns

Array of Gcloud::Bigquery::Dataset (See Gcloud::Bigquery::Dataset::List)

Examples

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

datasets = bigquery.datasets
datasets.each do |dataset|
  puts dataset.name
end

You can also retrieve all datasets, including hidden ones, by providing the :all option:

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

all_datasets = bigquery.datasets, all: true

If you have a significant number of datasets, you may need to paginate through them: (See Dataset::List#token)

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

all_datasets = []
tmp_datasets = bigquery.datasets
while tmp_datasets.any? do
  tmp_datasets.each do |dataset|
    all_datasets << dataset
  end
  # break loop if no more datasets available
  break if tmp_datasets.token.nil?
  # get the next group of datasets
  tmp_datasets = bigquery.datasets token: tmp_datasets.token
end

# File 'lib/gcloud/bigquery/project.rb', line 422

def datasets all: nil, token: nil, max: nil
  ensure_connection!
  options = { all: all, token: token, max: max }
  resp = connection.list_datasets options
  if resp.success?
    Dataset::List.from_response resp, connection
  else
    fail ApiError.from_response(resp)
  end
end

#job(job_id) ⇒ `Object`

Retrieves an existing job by ID.

Parameters

job_id: The ID of a job. (String)

Returns

Gcloud::Bigquery::Job or nil if job does not exist

Example

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

job = bigquery.job "my_job"

# File 'lib/gcloud/bigquery/project.rb', line 454

def job job_id
  ensure_connection!
  resp = connection.get_job job_id
  if resp.success?
    Job.from_gapi resp.data, connection
  else
    return nil if resp.status == 404
    fail ApiError.from_response(resp)
  end
end

#jobs(all: nil, token: nil, max: nil, filter: nil) ⇒ `Object`

Retrieves the list of jobs belonging to the project.

Parameters

all

Whether to display jobs owned by all users in the project. The default is false. (Boolean)

token

A previously-returned page token representing part of the larger set of results to view. (String)

max

Maximum number of jobs to return. (Integer)

filter

A filter for job state. (String)

Acceptable values are:

done - Finished jobs
pending - Pending jobs
running - Running jobs

Returns

Array of Gcloud::Bigquery::Job (See Gcloud::Bigquery::Job::List)

Examples

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

jobs = bigquery.jobs

You can also retrieve only running jobs using the :filter option:

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

running_jobs = bigquery.jobs filter: "running"

If you have a significant number of jobs, you may need to paginate through them: (See Job::List#token)

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

all_jobs = []
tmp_jobs = bigquery.jobs
while tmp_jobs.any? do
  tmp_jobs.each do |job|
    all_jobs << job
  end
  # break loop if no more jobs available
  break if tmp_jobs.token.nil?
  # get the next group of jobs
  tmp_jobs = bigquery.jobs token: tmp_jobs.token
end

# File 'lib/gcloud/bigquery/project.rb', line 528

def jobs all: nil, token: nil, max: nil, filter: nil
  ensure_connection!
  options = { all: all, token: token, max: max, filter: filter }
  resp = connection.list_jobs options
  if resp.success?
    Job::List.from_response resp, connection
  else
    fail ApiError.from_response(resp)
  end
end

#project ⇒ `Object`

The BigQuery project connected to.

Example

require "gcloud"

gcloud = Gcloud.new "my-todo-project", "/path/to/keyfile.json"
bigquery = gcloud.bigquery

bigquery.project #=> "my-todo-project"



72
73
74

# File 'lib/gcloud/bigquery/project.rb', line 72

def project
  connection.project
end

#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true, dataset: nil, project: nil) ⇒ `Object`

Queries data using the synchronous method.

Parameters

query: A query string, following the BigQuery query syntax, of the query to execute. Example: “SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]”. (String)
max: The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies. (Integer)
timeout: How long to wait for the query to complete, in milliseconds, before the request times out and returns. Note that this is only a timeout for the request, not the query. If the query takes longer to run than the timeout value, the call returns without any results and with QueryData#complete? set to false. The default value is 10000 milliseconds (10 seconds). (Integer)
dryrun: If set to true, BigQuery doesn’t run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value is false. (Boolean)
cache: Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching. (Boolean)
dataset: Specifies the default datasetId and projectId to assume for any unqualified table names in the query. If not set, all table names in the query string must be qualified in the format ‘datasetId.tableId’. (String)
project: Specifies the default projectId to assume for any unqualified table names in the query. Only used if dataset option is set. (String)

Returns

Gcloud::Bigquery::QueryData

Example

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

data = bigquery.query "SELECT name FROM [my_proj:my_data.my_table]"
data.each do |row|
  puts row["name"]
end

# File 'lib/gcloud/bigquery/project.rb', line 231

def query query, max: nil, timeout: 10000, dryrun: nil, cache: true,
          dataset: nil, project: nil
  ensure_connection!
  options = { max: max, timeout: timeout, dryrun: dryrun, cache: cache,
              dataset: dataset, project: project }
  resp = connection.query query, options
  if resp.success?
    QueryData.from_gapi resp.data, connection
  else
    fail ApiError.from_response(resp)
  end
end

#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil, dataset: nil) ⇒ `Object`

Queries data using the asynchronous method.

Parameters

query

A query string, following the BigQuery query syntax, of the query to execute. Example: “SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]”. (String)

priority

Specifies a priority for the query. Possible values include INTERACTIVE and BATCH. The default value is INTERACTIVE. (String)

cache

Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. (Boolean)

table

The destination table where the query results should be stored. If not present, a new table will be created to store the results. (Table)

create

Specifies whether the job is allowed to create new tables. (String)

The following values are supported:

needed - Create the table if it does not exist.
never - The table must already exist. A ‘notFound’ error is raised if the table does not exist.

write

Specifies the action that occurs if the destination table already exists. (String)

The following values are supported:

truncate - BigQuery overwrites the table data.
append - BigQuery appends the data to the table.
empty - A ‘duplicate’ error is returned in the job result if the table exists and contains data.

large_results

If true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requires table parameter to be set. (Boolean)

flatten

Flattens all nested and repeated fields in the query results. The default value is true. large_results parameter must be true if this is set to false. (Boolean)

dataset

Specifies the default dataset to use for unqualified table names in the query. (Dataset or String)

Returns

Gcloud::Bigquery::QueryJob

Example

require "gcloud"

gcloud = Gcloud.new
bigquery = gcloud.bigquery

job = bigquery.query_job "SELECT name FROM [my_proj:my_data.my_table]"

job.wait_until_done!
if !job.failed?
  job.query_results.each do |row|
    puts row["name"]
  end
end

# File 'lib/gcloud/bigquery/project.rb', line 155

def query_job query, priority: "INTERACTIVE", cache: true, table: nil,
              create: nil, write: nil, large_results: nil, flatten: nil,
              dataset: nil
  ensure_connection!
  options = { priority: priority, cache: cache, table: table,
              create: create, write: write, large_results: large_results,
              flatten: flatten, dataset: dataset }
  resp = connection.query_job query, options
  if resp.success?
    Job.from_gapi resp.data, connection
  else
    fail ApiError.from_response(resp)
  end
end

Class: Gcloud::Bigquery::Project

Overview

Project

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(project, credentials) ⇒ Project

Instance Attribute Details

#connection ⇒ Object

Class Method Details

.default_project ⇒ Object

Instance Method Details

#create_dataset(dataset_id, name: nil, description: nil, expiration: nil, access: nil) ⇒ Object

Parameters

Returns

Examples

#dataset(dataset_id) ⇒ Object

Parameters

Returns

Example

#datasets(all: nil, token: nil, max: nil) ⇒ Object

Parameters

Returns

Examples

#job(job_id) ⇒ Object

Parameters

Returns

Example

#jobs(all: nil, token: nil, max: nil, filter: nil) ⇒ Object

Parameters

Returns

Examples

#project ⇒ Object

Example

#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true, dataset: nil, project: nil) ⇒ Object

Parameters

Returns

Example

#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil, dataset: nil) ⇒ Object

Parameters

Returns

Example

#initialize(project, credentials) ⇒ `Project`

#connection ⇒ `Object`

.default_project ⇒ `Object`

#create_dataset(dataset_id, name: nil, description: nil, expiration: nil, access: nil) ⇒ `Object`

#dataset(dataset_id) ⇒ `Object`

#datasets(all: nil, token: nil, max: nil) ⇒ `Object`

#job(job_id) ⇒ `Object`

#jobs(all: nil, token: nil, max: nil, filter: nil) ⇒ `Object`

#project ⇒ `Object`

#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true, dataset: nil, project: nil) ⇒ `Object`

#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil, dataset: nil) ⇒ `Object`