Class: Google::Cloud::Spanner::BatchSnapshot

Inherits:

Object

Object
Google::Cloud::Spanner::BatchSnapshot

Defined in:: lib/google/cloud/spanner/batch_snapshot.rb

Overview

# BatchSnapshot

Represents a read-only transaction that can be configured to read at timestamps in the past and allows for exporting arbitrarily large amounts of data from Cloud Spanner databases. This is a snapshot which additionally allows to partition a read or query request. The read/query request can then be executed independently over each partition while observing the same snapshot of the database. A BatchSnapshot can also be shared across multiple processes/machines by passing around its serialized value and then recreating the transaction using BatchClient#dump.

Unlike locking read-write transactions, BatchSnapshot will never abort. They can fail if the chosen read timestamp is garbage collected; however any read or query activity within an hour on the transaction avoids garbage collection and most applications do not need to worry about this in practice.

See Google::Cloud::Spanner::BatchClient#batch_snapshot and Google::Cloud::Spanner::BatchClient#load_batch_snapshot.

Examples:

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new

batch_client = spanner.batch_client "my-instance", "my-database"
batch_snapshot = batch_client.batch_snapshot

partitions = batch_snapshot.partition_read "users", [:id, :name]

partition = partitions.first
results = batch_snapshot.execute_partition partition

batch_snapshot.close

Instance Attribute Summary collapse

#grpc ⇒ Object readonly
#session ⇒ Object readonly

Class Method Summary collapse

.from_grpc(grpc, session) ⇒ Object

Google::Spanner::V1::Transaction.
.load(data, service: nil) ⇒ Object

Google::Cloud::Spanner::BatchClient#load_batch_snapshot.

Instance Method Summary collapse

#close ⇒ Object

Closes the batch snapshot and releases the underlying resources.
#dump ⇒ String (also: #serialize)

Serializes the batch snapshot object so it can be recreated on another process.
#execute(sql, params: nil, types: nil) ⇒ Google::Cloud::Spanner::Results (also: #query)

Executes a SQL query.
#execute_partition(partition) ⇒ Object

Execute the partition to return a ResultSet.
#initialize(grpc, session) ⇒ BatchSnapshot constructor

A new instance of BatchSnapshot.
#partition_query(sql, params: nil, types: nil, partition_size_bytes: nil, max_partitions: nil) ⇒ Array<Google::Cloud::Spanner::Partition>

Returns a list of Partition objects to execute a batch query against a database.
#partition_read(table, columns, keys: nil, index: nil, partition_size_bytes: nil, max_partitions: nil) ⇒ Array<Google::Cloud::Spanner::Partition>

Returns a list of Partition objects to read zero or more rows from a database.
#read(table, columns, keys: nil, index: nil, limit: nil) ⇒ Google::Cloud::Spanner::Results

Read rows from a database table, as a simple alternative to #execute.
#timestamp ⇒ Time

The read timestamp chosen for batch snapshot.
#to_h ⇒ Hash

Converts the the batch snapshot object to a Hash ready for serialization.
#transaction_id ⇒ String

Identifier of the batch snapshot transaction.

Constructor Details

#initialize(grpc, session) ⇒ `BatchSnapshot`

Returns a new instance of BatchSnapshot.

# File 'lib/google/cloud/spanner/batch_snapshot.rb', line 71

def initialize grpc, session
  @grpc = grpc
  @session = session
end

Instance Attribute Details

#grpc ⇒ `Object` (readonly)



64
65
66

# File 'lib/google/cloud/spanner/batch_snapshot.rb', line 64

def grpc
  @grpc
end

#session ⇒ `Object` (readonly)



67
68
69

# File 'lib/google/cloud/spanner/batch_snapshot.rb', line 67

def session
  @session
end

Class Method Details

.from_grpc(grpc, session) ⇒ `Object`

Google::Spanner::V1::Transaction.



636
637
638

# File 'lib/google/cloud/spanner/batch_snapshot.rb', line 636

def self.from_grpc grpc, session
  new grpc, session
end

.load(data, service: nil) ⇒ `Object`

Google::Cloud::Spanner::BatchClient#load_batch_snapshot.

# File 'lib/google/cloud/spanner/batch_snapshot.rb', line 622

def self.load data, service: nil
  data = JSON.parse data, symbolize_names: true unless data.is_a? Hash

  session_grpc = Google::Spanner::V1::Session.decode \
    Base64.decode64(data[:session])
  transaction_grpc = Google::Spanner::V1::Transaction.decode \
    Base64.decode64(data[:transaction])

  from_grpc transaction_grpc, Session.from_grpc(session_grpc, service)
end

Instance Method Details

#close ⇒ `Object`

Closes the batch snapshot and releases the underlying resources.

This should only be called once the batch snapshot is no longer needed anywhere. In particular if this batch snapshot is being used across multiple machines, calling this method on any of the machines will render the batch snapshot invalid everywhere.

Examples:

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new

batch_client = spanner.batch_client "my-instance", "my-database"
batch_snapshot = batch_client.batch_snapshot

partitions = batch_snapshot.partition_read "users", [:id, :name]

partition = partitions.first
results = batch_snapshot.execute_partition partition

batch_snapshot.close

# File 'lib/google/cloud/spanner/batch_snapshot.rb', line 355

def close
  ensure_session!

  session.release!
end

#dump ⇒ `String` Also known as: serialize

Serializes the batch snapshot object so it can be recreated on another process. See Google::Cloud::Spanner::BatchClient#load_batch_snapshot.

Examples:

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new

batch_client = spanner.batch_client "my-instance", "my-database"

batch_snapshot = batch_client.batch_snapshot

partitions = batch_snapshot.partition_read "users", [:id, :name]

partition = partitions.first

serialized_snapshot = batch_snapshot.dump
serialized_partition = partition.dump

# In a separate process
new_batch_snapshot = batch_client.load_batch_snapshot \
  serialized_snapshot

new_partition = batch_client.load_partition \
  serialized_partition

results = new_batch_snapshot.execute_partition \
  new_partition

Returns:

(String) —

The serialized representation of the batch snapshot.



614
615
616

# File 'lib/google/cloud/spanner/batch_snapshot.rb', line 614

def dump
  JSON.dump to_h
end

#execute(sql, params: nil, types: nil) ⇒ `Google::Cloud::Spanner::Results` Also known as: query

Executes a SQL query.

Examples:

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new
batch_client = spanner.batch_client "my-instance", "my-database"
batch_snapshot = batch_client.batch_snapshot

results = batch_snapshot.execute "SELECT * FROM users"

results.rows.each do |row|
  puts "User #{row[:id]} is #{row[:name]}"
end

Query using query parameters:

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new
batch_client = spanner.batch_client "my-instance", "my-database"
batch_snapshot = batch_client.batch_snapshot

results = batch_snapshot.execute "SELECT * FROM users " \
                                 "WHERE active = @active",
                                 params: { active: true }

results.rows.each do |row|
  puts "User #{row[:id]} is #{row[:name]}"
end

Query with a SQL STRUCT query parameter as a Hash:

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new
batch_client = spanner.batch_client "my-instance", "my-database"
batch_snapshot = batch_client.batch_snapshot

user_hash = { id: 1, name: "Charlie", active: false }

results = batch_snapshot.execute "SELECT * FROM users WHERE " \
                                 "ID = @user_struct.id " \
                                 "AND name = @user_struct.name " \
                                 "AND active = @user_struct.active",
                                 params: { user_struct: user_hash }

results.rows.each do |row|
  puts "User #{row[:id]} is #{row[:name]}"
end

Specify the SQL STRUCT type using Fields object:

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new
batch_client = spanner.batch_client "my-instance", "my-database"
batch_snapshot = batch_client.batch_snapshot

user_type = batch_client.fields(
  { id: :INT64, name: :STRING, active: :BOOL }
)
user_hash = { id: 1, name: nil, active: false }

results = batch_snapshot.execute "SELECT * FROM users WHERE " \
                                 "ID = @user_struct.id " \
                                 "AND name = @user_struct.name " \
                                 "AND active = @user_struct.active",
                                 params: { user_struct: user_hash },
                                 types: { user_struct: user_type }

results.rows.each do |row|
  puts "User #{row[:id]} is #{row[:name]}"
end

Or, query with a SQL STRUCT as a typed Data object:

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new
batch_client = spanner.batch_client "my-instance", "my-database"
batch_snapshot = batch_client.batch_snapshot

user_type = batch_client.fields(
  { id: :INT64, name: :STRING, active: :BOOL }
)
user_data = user_type.struct id: 1, name: nil, active: false

results = batch_snapshot.execute "SELECT * FROM users WHERE " \
                                 "ID = @user_struct.id " \
                                 "AND name = @user_struct.name " \
                                 "AND active = @user_struct.active",
                                 params: { user_struct: user_data }

results.rows.each do |row|
  puts "User #{row[:id]} is #{row[:name]}"
end

Parameters:

sql (String) —

The SQL query string. See [Query syntax](cloud.google.com/spanner/docs/query-syntax).

The SQL query string can contain parameter placeholders. A parameter placeholder consists of “@” followed by the parameter name. Parameter names consist of any combination of letters, numbers, and underscores.
params (Hash) (defaults to: nil) —

SQL parameters for the query string. The parameter placeholders, minus the “@”, are the the hash keys, and the literal values are the hash values. If the query string contains something like “WHERE id > @msg_id”, then the params must contain something like ‘:msg_id => 1`.

Ruby types are mapped to Spanner types as follows:

| Spanner | Ruby | Notes | |————-|—————-|—| | ‘BOOL` | `true`/`false` | | | `INT64` | `Integer` | | | `FLOAT64` | `Float` | | | `STRING` | `String` | | | `DATE` | `Date` | | | `TIMESTAMP` | `Time`, `DateTime` | | | `BYTES` | `File`, `IO`, `StringIO`, or similar | | | `ARRAY` | `Array` | Nested arrays are not supported. | | `STRUCT` | `Hash`, Data | |

See [Data types](cloud.google.com/spanner/docs/data-definition-language#data_types).

See [Data Types - Constructing a STRUCT](cloud.google.com/spanner/docs/data-types#constructing-a-struct).
types (Hash) (defaults to: nil) —
Types of the SQL parameters in ‘params`. It is not always possible for Cloud Spanner to infer the right SQL type from a value in `params`. In these cases, the `types` hash must be used to specify the SQL type for these values.

The keys of the hash should be query string parameter placeholders, minus the “@”. The values of the hash should be Cloud Spanner type codes from the following list:
- ‘:BOOL`
- ‘:BYTES`
- ‘:DATE`
- ‘:FLOAT64`
- ‘:INT64`
- ‘:STRING`
- ‘:TIMESTAMP`
- ‘Array` - Lists are specified by providing the type code in an array. For example, an array of integers are specified as `[:INT64]`.
- Fields - Types for STRUCT values (‘Hash`/Data objects) are specified using a Fields object.
Types are optional.

Returns:

(Google::Cloud::Spanner::Results) —

The results of the query execution.

# File 'lib/google/cloud/spanner/batch_snapshot.rb', line 514

def execute sql, params: nil, types: nil
  ensure_session!

  params, types = Convert.to_input_params_and_types params, types

  session.execute sql, params: params, types: types,
                       transaction: tx_selector
end

#execute_partition(partition) ⇒ `Object`

Execute the partition to return a ResultSet. The result returned could be zero or more rows. The row metadata may be absent if no rows are returned.

Examples:

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new

batch_client = spanner.batch_client "my-instance", "my-database"
batch_snapshot = batch_client.batch_snapshot

partitions = batch_snapshot.partition_read "users", [:id, :name]

partition = partitions.first
results = batch_snapshot.execute_partition partition

batch_snapshot.close

Parameters:

partition (Google::Cloud::Spanner::Partition) —

The partition to be executed.

# File 'lib/google/cloud/spanner/batch_snapshot.rb', line 316

def execute_partition partition
  ensure_session!

  partition = Partition.load partition unless partition.is_a? Partition
  # TODO: raise if partition.empty?

  # TODO: raise if session.path != partition.session
  # TODO: raise if grpc.transaction != partition.transaction

  if partition.execute?
    execute_partition_query partition
  elsif partition.read?
    execute_partition_read partition
  end
end

#partition_query(sql, params: nil, types: nil, partition_size_bytes: nil, max_partitions: nil) ⇒ `Array<Google::Cloud::Spanner::Partition>`

Returns a list of Partition objects to execute a batch query against a database.

These partitions can be executed across multiple processes, even across different machines. The partition size and count can be configured, although the values given may not necessarily be honored depending on the query and options in the request.

The query must have a single [distributed union](cloud.google.com/spanner/docs/query-execution-operators#distributed_union) operator at the root of the query plan. Such queries are root-partitionable. If a query cannot be partitioned at the root, Cloud Spanner cannot achieve the parallelism and in this case partition generation will fail.

Examples:

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new

batch_client = spanner.batch_client "my-instance", "my-database"
batch_snapshot = batch_client.batch_snapshot

sql = "SELECT u.id, u.active FROM users AS u \
       WHERE u.id < 2000 AND u.active = false"
partitions = batch_snapshot.partition_query sql

partition = partitions.first
results = batch_snapshot.execute_partition partition

batch_snapshot.close

Parameters:

sql (String) —

The SQL query string. See [Query syntax](cloud.google.com/spanner/docs/query-syntax).

The SQL query string can contain parameter placeholders. A parameter placeholder consists of “@” followed by the parameter name. Parameter names consist of any combination of letters, numbers, and underscores.
params (Hash) (defaults to: nil) —

SQL parameters for the query string. The parameter placeholders, minus the “@”, are the the hash keys, and the literal values are the hash values. If the query string contains something like “WHERE id > @msg_id”, then the params must contain something like ‘:msg_id => 1`.

Ruby types are mapped to Spanner types as follows:

| Spanner | Ruby | Notes | |————-|—————-|—| | ‘BOOL` | `true`/`false` | | | `INT64` | `Integer` | | | `FLOAT64` | `Float` | | | `STRING` | `String` | | | `DATE` | `Date` | | | `TIMESTAMP` | `Time`, `DateTime` | | | `BYTES` | `File`, `IO`, `StringIO`, or similar | | | `ARRAY` | `Array` | Nested arrays are not supported. | | `STRUCT` | `Hash`, Data | |

See [Data types](cloud.google.com/spanner/docs/data-definition-language#data_types).

See [Data Types - Constructing a STRUCT](cloud.google.com/spanner/docs/data-types#constructing-a-struct).
types (Hash) (defaults to: nil) —
Types of the SQL parameters in ‘params`. It is not always possible for Cloud Spanner to infer the right SQL type from a value in `params`. In these cases, the `types` hash must be used to specify the SQL type for these values.

The keys of the hash should be query string parameter placeholders, minus the “@”. The values of the hash should be Cloud Spanner type codes from the following list:
- ‘:BOOL`
- ‘:BYTES`
- ‘:DATE`
- ‘:FLOAT64`
- ‘:INT64`
- ‘:STRING`
- ‘:TIMESTAMP`
- ‘Array` - Lists are specified by providing the type code in an array. For example, an array of integers are specified as `[:INT64]`.
- Fields - Types for STRUCT values (‘Hash`/Data objects) are specified using a Fields object.
Types are optional.
partition_size_bytes (Integer) (defaults to: nil) —

The desired data size for each partition generated. This is only a hint. The actual size of each partition may be smaller or larger than this size request.
max_partitions (Integer) (defaults to: nil) —

The desired maximum number of partitions to return. For example, this may be set to the number of workers available. This is only a hint and may provide different results based on the request.

Returns:

(Array<Google::Cloud::Spanner::Partition>) —

The partitions created by the query partition.

# File 'lib/google/cloud/spanner/batch_snapshot.rb', line 191

def partition_query sql, params: nil, types: nil,
                    partition_size_bytes: nil, max_partitions: nil
  ensure_session!

  params, types = Convert.to_input_params_and_types params, types

  results = session.partition_query \
    sql, tx_selector, params: params, types: types,
                      partition_size_bytes: partition_size_bytes,
                      max_partitions: max_partitions

  results.partitions.map do |grpc|
    # Convert partition protos to execute sql request protos
    execute_grpc = Google::Spanner::V1::ExecuteSqlRequest.new(
      {
        session: session.path,
        sql: sql,
        params: params,
        param_types: types,
        transaction: tx_selector,
        partition_token: grpc.partition_token
      }.delete_if { |_, v| v.nil? }
    )
    Partition.from_execute_grpc execute_grpc
  end
end

#partition_read(table, columns, keys: nil, index: nil, partition_size_bytes: nil, max_partitions: nil) ⇒ `Array<Google::Cloud::Spanner::Partition>`

Returns a list of Partition objects to read zero or more rows from a database.

Examples:

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new

batch_client = spanner.batch_client "my-instance", "my-database"
batch_snapshot = batch_client.batch_snapshot

partitions = batch_snapshot.partition_read "users", [:id, :name]

partition = partitions.first
results = batch_snapshot.execute_partition partition

batch_snapshot.close

Parameters:

table (String) —

The name of the table in the database to be read.
columns (Array<String, Symbol>) —

The columns of table to be returned for each row matching this request.
keys (Object, Array<Object>) (defaults to: nil) —

A single, or list of keys or key ranges to match returned data to. Values should have exactly as many elements as there are columns in the primary key.
index (String) (defaults to: nil) —

The name of an index to use instead of the table’s primary key when interpreting ‘id` and sorting result rows. Optional.
partition_size_bytes (Integer) (defaults to: nil) —

The desired data size for each partition generated. This is only a hint. The actual size of each partition may be smaller or larger than this size request.
max_partitions (Integer) (defaults to: nil) —

The desired maximum number of partitions to return. For example, this may be set to the number of workers available. This is only a hint and may provide different results based on the request.

Returns:

(Array<Google::Cloud::Spanner::Partition>) —

The partitions created by the read partition.

# File 'lib/google/cloud/spanner/batch_snapshot.rb', line 263

def partition_read table, columns, keys: nil, index: nil,
                   partition_size_bytes: nil, max_partitions: nil
  ensure_session!

  columns = Array(columns).map(&:to_s)
  keys = Convert.to_key_set keys

  results = session.partition_read \
    table, columns, tx_selector,
    keys: keys, index: index,
    partition_size_bytes: partition_size_bytes,
    max_partitions: max_partitions

  results.partitions.map do |grpc|
    # Convert partition protos to read request protos
    read_grpc = Google::Spanner::V1::ReadRequest.new(
      {
        session: session.path,
        table: table,
        columns: columns,
        key_set: keys,
        index: index,
        transaction: tx_selector,
        partition_token: grpc.partition_token
      }.delete_if { |_, v| v.nil? }
    )
    Partition.from_read_grpc read_grpc
  end
end

#read(table, columns, keys: nil, index: nil, limit: nil) ⇒ `Google::Cloud::Spanner::Results`

Read rows from a database table, as a simple alternative to #execute.

Examples:

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new
batch_client = spanner.batch_client "my-instance", "my-database"
batch_snapshot = batch_client.batch_snapshot

results = batch_snapshot.read "users", [:id, :name]

results.rows.each do |row|
  puts "User #{row[:id]} is #{row[:name]}"
end

Parameters:

table (String) —

The name of the table in the database to be read.
columns (Array<String, Symbol>) —

The columns of table to be returned for each row matching this request.
keys (Object, Array<Object>) (defaults to: nil) —

A single, or list of keys or key ranges to match returned data to. Values should have exactly as many elements as there are columns in the primary key.
index (String) (defaults to: nil) —

The name of an index to use instead of the table’s primary key when interpreting ‘id` and sorting result rows. Optional.
limit (Integer) (defaults to: nil) —

If greater than zero, no more than this number of rows will be returned. The default is no limit.

Returns:

(Google::Cloud::Spanner::Results) —

The results of the read operation.

# File 'lib/google/cloud/spanner/batch_snapshot.rb', line 557

def read table, columns, keys: nil, index: nil, limit: nil
  ensure_session!

  columns = Array(columns).map(&:to_s)
  keys = Convert.to_key_set keys

  session.read table, columns, keys: keys, index: index, limit: limit,
                               transaction: tx_selector
end

#timestamp ⇒ `Time`

The read timestamp chosen for batch snapshot.

Returns:

(Time) —

The chosen timestamp.

# File 'lib/google/cloud/spanner/batch_snapshot.rb', line 87

def timestamp
  return nil if grpc.nil?
  Convert.timestamp_to_time grpc.read_timestamp
end

#to_h ⇒ `Hash`

Converts the the batch snapshot object to a Hash ready for serialization.

Returns:

(Hash) —

A hash containing a representation of the batch snapshot object.

# File 'lib/google/cloud/spanner/batch_snapshot.rb', line 575

def to_h
  {
    session: Base64.strict_encode64(@session.grpc.to_proto),
    transaction: Base64.strict_encode64(@grpc.to_proto)
  }
end

#transaction_id ⇒ `String`

Identifier of the batch snapshot transaction.

Returns:

(String) —

The transaction id.

# File 'lib/google/cloud/spanner/batch_snapshot.rb', line 79

def transaction_id
  return nil if grpc.nil?
  grpc.id
end

Class: Google::Cloud::Spanner::BatchSnapshot

Overview

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(grpc, session) ⇒ BatchSnapshot

Instance Attribute Details

#grpc ⇒ Object (readonly)

#session ⇒ Object (readonly)

Class Method Details

.from_grpc(grpc, session) ⇒ Object

.load(data, service: nil) ⇒ Object

Instance Method Details

#close ⇒ Object

#dump ⇒ String Also known as: serialize

#execute(sql, params: nil, types: nil) ⇒ Google::Cloud::Spanner::Results Also known as: query

#execute_partition(partition) ⇒ Object

#partition_query(sql, params: nil, types: nil, partition_size_bytes: nil, max_partitions: nil) ⇒ Array<Google::Cloud::Spanner::Partition>

#partition_read(table, columns, keys: nil, index: nil, partition_size_bytes: nil, max_partitions: nil) ⇒ Array<Google::Cloud::Spanner::Partition>

#read(table, columns, keys: nil, index: nil, limit: nil) ⇒ Google::Cloud::Spanner::Results

#timestamp ⇒ Time

#to_h ⇒ Hash

#transaction_id ⇒ String

#initialize(grpc, session) ⇒ `BatchSnapshot`

#grpc ⇒ `Object` (readonly)

#session ⇒ `Object` (readonly)

.from_grpc(grpc, session) ⇒ `Object`

.load(data, service: nil) ⇒ `Object`

#close ⇒ `Object`

#dump ⇒ `String` Also known as: serialize

#execute(sql, params: nil, types: nil) ⇒ `Google::Cloud::Spanner::Results` Also known as: query

#execute_partition(partition) ⇒ `Object`

#partition_query(sql, params: nil, types: nil, partition_size_bytes: nil, max_partitions: nil) ⇒ `Array<Google::Cloud::Spanner::Partition>`

#partition_read(table, columns, keys: nil, index: nil, partition_size_bytes: nil, max_partitions: nil) ⇒ `Array<Google::Cloud::Spanner::Partition>`

#read(table, columns, keys: nil, index: nil, limit: nil) ⇒ `Google::Cloud::Spanner::Results`

#timestamp ⇒ `Time`

#to_h ⇒ `Hash`

#transaction_id ⇒ `String`