Method: Google::Cloud::Spanner::Client#execute_partition_update

Defined in:
lib/google/cloud/spanner/client.rb

#execute_partition_update(sql, params: nil, types: nil, exclude_txn_from_change_streams: false, query_options: nil, request_options: nil, call_options: nil) ⇒ Integer Also known as: execute_pdml

Executes a Partitioned DML SQL statement.

Partitioned DML is an alternate implementation with looser semantics to enable large-scale changes without running into transaction size limits or (accidentally) locking the entire table in one large transaction. At a high level, it partitions the keyspace and executes the statement on each partition in separate internal transactions.

Partitioned DML does not guarantee database-wide atomicity of the statement - it guarantees row-based atomicity, which includes updates to any indices. Additionally, it does not guarantee that it will execute exactly one time against each row - it guarantees "at least once" semantics.

Where DML statements must be executed using Transaction (see Transaction#execute_update), Partitioned DML statements are executed outside of a read/write transaction.

Not all DML statements can be executed in the Partitioned DML mode and the backend will return an error for the statements which are not supported.

DML statements must be fully-partitionable. Specifically, the statement must be expressible as the union of many statements which each access only a single row of the table. InvalidArgumentError is raised if the statement does not qualify.

The method will block until the update is complete. Running a DML statement with this method does not offer exactly once semantics, and therefore the DML statement should be idempotent. The DML statement must be fully-partitionable. Specifically, the statement must be expressible as the union of many statements which each access only a single row of the table. This is a Partitioned DML transaction in which a single Partitioned DML statement is executed. Partitioned DML partitions the and runs the DML statement over each partition in parallel using separate, internal transactions that commit independently. Partitioned DML transactions do not need to be committed.

Partitioned DML updates are used to execute a single DML statement with a different execution strategy that provides different, and often better, scalability properties for large, table-wide operations than DML in a Transaction#execute_update transaction. Smaller scoped statements, such as an OLTP workload, should prefer using Transaction#execute_update.

That said, Partitioned DML is not a drop-in replacement for standard DML used in Transaction#execute_update.

  • The DML statement must be fully-partitionable. Specifically, the statement must be expressible as the union of many statements which each access only a single row of the table.
  • The statement is not applied atomically to all rows of the table. Rather, the statement is applied atomically to partitions of the table, in independent internal transactions. Secondary index rows are updated atomically with the base table rows.
  • Partitioned DML does not guarantee exactly-once execution semantics against a partition. The statement will be applied at least once to each partition. It is strongly recommended that the DML statement should be idempotent to avoid unexpected results. For instance, it is potentially dangerous to run a statement such as UPDATE table SET column = column + 1 as it could be run multiple times against some rows.
  • The partitions are committed automatically - there is no support for Commit or Rollback. If the call returns an error, or if the client issuing the DML statement dies, it is possible that some rows had the statement executed on them successfully. It is also possible that statement was never executed against other rows.
  • If any error is encountered during the execution of the partitioned DML operation (for instance, a UNIQUE INDEX violation, division by zero, or a value that cannot be stored due to schema constraints), then the operation is stopped at that point and an error is returned. It is possible that at this point, some partitions have been committed (or even committed multiple times), and other partitions have not been run at all.

Given the above, Partitioned DML is good fit for large, database-wide, operations that are idempotent, such as deleting old rows from a very large table.

Examples:

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new
db = spanner.client "my-instance", "my-database"

row_count = db.execute_partition_update \
 "UPDATE users SET friends = NULL WHERE active = false"

Query using query parameters:

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new
db = spanner.client "my-instance", "my-database"

row_count = db.execute_partition_update \
 "UPDATE users SET friends = NULL WHERE active = @active",
 params: { active: false }

Query using query options:

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new
db = spanner.client "my-instance", "my-database"

row_count = db.execute_partition_update \
 "UPDATE users SET friends = NULL WHERE active = false",
 query_options: {
   optimizer_version: "1",
   optimizer_statistics_package: "auto_20191128_14_47_22UTC"
 }

Query using custom timeout and retry policy:

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new
db = spanner.client "my-instance", "my-database"

timeout = 30.0
retry_policy = {
  initial_delay: 0.25,
  max_delay:     32.0,
  multiplier:    1.3,
  retry_codes:   ["UNAVAILABLE"]
}
call_options = { timeout: timeout, retry_policy: retry_policy }

row_count = db.execute_partition_update \
 "UPDATE users SET friends = NULL WHERE active = false",
 call_options: call_options

Using request options.

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new
db = spanner.client "my-instance", "my-database"

request_options = { priority: :PRIORITY_MEDIUM }
row_count = db.execute_partition_update \
 "UPDATE users SET friends = NULL WHERE active = @active",
 params: { active: false }, request_options: request_options

Query using tag for request query statistics collection.


require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new
db = spanner.client "my-instance", "my-database"

request_options = { tag: "Update-Users" }
row_count = db.execute_partition_update \
  "UPDATE users SET friends = NULL WHERE active = false",
  request_options: request_options

Parameters:

  • sql (String)

    The Partitioned DML statement string. See Query syntax.

    The Partitioned DML statement string can contain parameter placeholders. A parameter placeholder consists of "@" followed by the parameter name. Parameter names consist of any combination of letters, numbers, and underscores.

  • params (Hash) (defaults to: nil)

    Parameters for the Partitioned DML statement string. The parameter placeholders, minus the "@", are the the hash keys, and the literal values are the hash values. If the query string contains something like "WHERE id > @msg_id", then the params must contain something like :msg_id => 1.

    Ruby types are mapped to Spanner types as follows:

    Spanner Ruby Notes
    BOOL true/false
    INT64 Integer
    FLOAT64 Float
    FLOAT32 Float
    NUMERIC BigDecimal
    STRING String
    UUID String
    DATE Date
    TIMESTAMP Time, DateTime
    BYTES File, IO, StringIO, or similar
    ARRAY Array Nested arrays are not supported.
    STRUCT Hash, Data

    See Data types.

    See Data Types - Constructing a STRUCT.

  • types (Hash) (defaults to: nil)

    Types of the SQL parameters in params. It is not always possible for Cloud Spanner to infer the right SQL type from a value in params. In these cases, the types hash can be used to specify the exact SQL type for some or all of the SQL query parameters.

    The keys of the hash should be query string parameter placeholders, minus the "@". The values of the hash should be Cloud Spanner type codes from the following list:

    • :BOOL
    • :BYTES
    • :DATE
    • :FLOAT64
    • :FLOAT32
    • :NUMERIC
    • :INT64
    • :STRING
    • :TIMESTAMP
    • :UUID
    • Array - Lists are specified by providing the type code in an array. For example, an array of integers are specified as [:INT64].
    • Fields - Nested Structs are specified by providing a Fields object.
  • exclude_txn_from_change_streams (Boolean) (defaults to: false)

    If set to true, mutations will not be recorded in change streams with DDL option allow_txn_exclusion=true.

  • query_options (Hash) (defaults to: nil)

    A hash of values to specify the custom query options for executing SQL query. Query options are optional. The following settings can be provided:

    • :optimizer_version (String) The version of optimizer to use. Empty to use database default. "latest" to use the latest available optimizer version.
    • :optimizer_statistics_package (String) Statistics package to use. Empty to use the database default.
  • request_options (Hash) (defaults to: nil)

    Common request options.

    • :priority (String) The relative priority for requests. The priority acts as a hint to the Cloud Spanner scheduler and does not guarantee priority or order of execution. Valid values are :PRIORITY_LOW, :PRIORITY_MEDIUM, :PRIORITY_HIGH. If priority not set then default is PRIORITY_UNSPECIFIED is equivalent to :PRIORITY_HIGH.
    • :tag (String) A per-request tag which can be applied to queries or reads, used for statistics collection. Tag must be a valid identifier of the form: [a-zA-Z][a-zA-Z0-9_\-] between 2 and 64 characters in length.
  • call_options (Hash) (defaults to: nil)

    A hash of values to specify the custom call options, e.g., timeout, retries, etc. Call options are optional. The following settings can be provided:

    • :timeout (Numeric) A numeric value of custom timeout in seconds that overrides the default setting.
    • :retry_policy (Hash) A hash of values that overrides the default setting of retry policy with the following keys:
      • :initial_delay (Numeric) - The initial delay in seconds.
      • :max_delay (Numeric) - The max delay in seconds.
      • :multiplier (Numeric) - The incremental backoff multiplier.
      • :retry_codes (Array<String>) - The error codes that should trigger a retry.

Returns:

  • (Integer)

    The lower bound number of rows that were modified.



773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
# File 'lib/google/cloud/spanner/client.rb', line 773

def execute_partition_update sql, params: nil, types: nil,
                             exclude_txn_from_change_streams: false,
                             query_options: nil, request_options: nil,
                             call_options: nil
  ensure_service!

  params, types = Convert.to_input_params_and_types params, types
  request_options = Convert.to_request_options request_options,
                                               tag_type: :request_tag
  route_to_leader = LARHeaders.partition_query
  results = nil
  @pool.with_session do |session|
    transaction = pdml_transaction session, exclude_txn_from_change_streams: exclude_txn_from_change_streams
    results = session.execute_query \
      sql, params: params, types: types,
      transaction: transaction,
      query_options: query_options, request_options: request_options,
      call_options: call_options, route_to_leader: route_to_leader
  end
  # Stream all PartialResultSet to get ResultSetStats
  results.rows.to_a
  # Raise an error if there is not a row count returned
  if results.row_count.nil?
    raise Google::Cloud::InvalidArgumentError,
          "Partitioned DML statement is invalid."
  end
  results.row_count
end