Migrating to google-cloud-dataproc 1.0

The 1.0 release of the google-cloud-dataproc client is a significant upgrade based on a next-gen code generator, and includes substantial interface changes. Existing code written for earlier versions of this library will likely require updates to use this version. This document describes the changes that have been made, and what you need to do to update your usage.

To summarize:

  • The library has been broken out into multiple libraries. The new gems google-cloud-dataproc-v1 and google-cloud-dataproc-v1beta2 contain the actual client classes for versions V2 and V1beta2 of the Dataproc service, and the gem google-cloud-dataproc now simply provides a convenience wrapper. See Library Structure for more info.
  • The library uses a new configuration mechanism giving you closer control over endpoint address, network timeouts, and retry. See Client Configuration for more info. Furthermore, when creating a client object, you can customize its configuration in a block rather than passing arguments to the constructor. See Creating Clients for more info.
  • Previously, positional arguments were used to indicate required arguments. Now, all method arguments are keyword arguments, with documentation that specifies whether they are required or optional. Additionally, you can pass a proto request object instead of separate arguments. See Passing Arguments for more info.
  • Previously, some client classes included class methods for constructing resource paths. These paths are now instance methods on the client objects, and are also available in a separate paths module. See Resource Path Helpers for more info.
  • Previously, clients reported RPC errors by raising instances of Google::Gax::GaxError and its subclasses. Now, RPC exceptions are of type Google::Cloud::Error and its subclasses. See Handling Errors for more info.
  • Some classes have moved into different namespaces. See Class Namespaces for more info.

Library Structure

Older 0.x releases of the google-cloud-dataproc gem were all-in-one gems that included potentially multiple clients for multiple versions of the Dataproc service. Factory methods such as Google::Cloud::Dataproc::ClusterController.new would return you instances of client classes such as Google::Cloud::Dataproc::V1::ClusterController or Google::Cloud::Dataproc::V1beta2::ClusterController, depending on which version of the API requested. These classes were all defined in the same gem.

With the 1.0 release, the google-cloud-dataproc gem still provides factory methods for obtaining clients. (The method signatures will have changed. See Creating Clients for details.) However, the actual client classes have been moved into separate gems, one per service version. The Google::Cloud::Dataproc::V1::ClusterController::Client class, along with its helpers and data types, is now part of the google-cloud-dataproc-v1 gem. Similarly, the Google::Cloud::Dataproc::V1beta2::ClusterController::Client class is part of the google-cloud-dataproc-v1beta2 gem.

For normal usage, you can continue to install the google-cloud-dataproc gem (which will bring in the versioned client gems as dependencies) and continue to use factory methods to create clients. However, you may alternatively choose to install only one of the versioned gems. For example, if you know you will only use V1 of the service, you can install google-cloud-dataproc-v1 by itself, and construct instances of the Google::Cloud::Dataproc::V1::ClusterController::Client client class directly.

Client Configuration

In older releases, if you wanted to customize performance parameters or low-level behavior of the client (such as credentials, timeouts, or instrumentation), you would pass a variety of keyword arguments to the client constructor. It was also extremely difficult to customize the default settings.

With the 1.0 release, a configuration interface provides control over these parameters, including defaults for all instances of a client, and settings for each specific client instance. For example, to set default credentials and timeout for all Dataproc V1 ClusterController clients:

Google::Cloud::Dataproc::V1::ClusterController::Client.configure do |config|
  config.credentials = "/path/to/credentials.json"
  config.timeout = 10.0
end

Individual RPCs can also be configured independently. For example, to set the timeout for the create_cluster call:

Google::Cloud::Dataproc::V1::ClusterController::Client.configure do |config|
  config.rpcs.create_cluster.timeout = 20.0
end

Defaults for certain configurations can be set for all Dataproc versions and services globally:

Google::Cloud::Dataproc.configure do |config|
  config.credentials = "/path/to/credentials.json"
  config.timeout = 10.0
end

Finally, you can override the configuration for each client instance. See the next section on Creating Clients for details.

Creating Clients

In older releases, to create a client object, you would use the new method of modules under Google::Cloud::Dataproc. For example, you might call Google::Cloud::Dataproc::ClusterController.new. Keyword arguments were available to select a service version and to configure parameters such as credentials and timeouts.

With the 1.0 release, use named class methods of Google::Cloud::Dataproc to create a client object. For example, Google::Cloud::Dataproc.cluster_controller. You may select a service version using the :version keyword argument. However, other configuration parameters should be set in a configuration block when you create the client.

Old:

client = Google::Cloud::Dataproc::ClusterController.new credentials: "/path/to/credentials.json"

New:

client = Google::Cloud::Dataproc.cluster_controller do |config|
  config.credentials = "/path/to/credentials.json"
end

The configuration block is optional. If you do not provide it, or you do not set some configuration parameters, then the default configuration is used. See Client Configuration.

Passing Arguments

In older releases, required arguments would be passed as positional method arguments, while most optional arguments would be passed as keyword arguments.

With the 1.0 release, all RPC arguments are passed as keyword arguments, regardless of whether they are required or optional. For example:

Old:

client = Google::Cloud::Dataproc::ClusterController.new

project_id = "my-project"
region = "us-central1"
cluster_name = "my_cluster"

# Arguments are positional
response = client.get_cluster project_id, region, cluster_name

New:

client = Google::Cloud::Dataproc.cluster_controller

project_id = "my-project"
region = "us-central1"
cluster_name = "my_cluster"

# All arguments are keyword arguments
response = client.get_cluster project_id: project_id, region: region,
                              cluster_name: cluster_name

In the 1.0 release, it is also possible to pass a request object, either as a hash or as a protocol buffer.

New:

client = Google::Cloud::Dataproc.cluster_controller

request = Google::Cloud::Dataproc::V1::GetClusterRequest.new(
  project_id: "my-project",
  region: "us-central1",
  cluster_name: "my_cluster"
)

# Pass a request object as a positional argument:
response = client.get_cluster request

Finally, in older releases, to provide call options, you would pass a Google::Gax::CallOptions object with the :options keyword argument. In the 1.0 release, pass call options using a second set of keyword arguments.

Old:

client = Google::Cloud::Dataproc::ClusterController.new

project_id = "my-project"
region = "us-central1"
cluster_name = "my_cluster"

options = Google::Gax::CallOptions.new timeout: 10.0

response = client.get_cluster project_id, region, cluster_name, options: options

New:

client = Google::Cloud::Dataproc.cluster_controller

project_id = "my-project"
region = "us-central1"
cluster_name = "my_cluster"

# Use a hash to wrap the normal call arguments (or pass a request object), and
# then add further keyword arguments for the call options.
response = client.get_feed(
    { project_id: project_id, region: region, cluster_name: cluster_name },
    timeout: 10.0)

Resource Path Helpers

The client library includes helper methods for generating the resource path strings passed to many calls. These helpers have changed in two ways:

  • In older releases, they are class methods on the client class. In the 1.0 release, they are instance methods on the client. They are also available on a separate paths module that you can include elsewhere for convenience.
  • In older releases, arguments to a resource path helper are passed as positional arguments. In the 1.0 release, they are passed as named keyword arguments. Some helpers also support different sets of arguments, each set corresponding to a different type of path.

Following is an example involving using a resource path helper.

Old:

client = Google::Cloud::Dataproc::WorkflowTemplateService.new

# Call the helper on the client class
name = Google::Cloud::Dataproc::V1::WorkflowTemplateServiceClient.
         workflow_template_path("my-project", "us-central1", "my-template")

response = client.get_workflow_template name

New:

client = Google::Cloud::Dataproc.workflow_template_service

# Call the helper on the client instance, and use keyword arguments
name = client.workflow_template_path project: "my-project",
                                     region: "us-central1",
                                     workflow_template: "my-template"

response = client.get_workflow_template name: name

Because helpers take keyword arguments, some can now generate several different variations on the path that were not available under earlier versions of the library. For example, workflow_template_path can generate paths with either a region or location as the parent resource.

New:

client = Google::Cloud::Dataproc.workflow_template_service

# Create paths with different parent resource types
name1 = client.workflow_template_path project: "my-project",
                                      region: "us-central1",
                                      workflow_template: "my-template"
# => "projects/my-project/regions/us-central1/workflowTemplates/my-template"
name2 = client.workflow_template_path project: "my-project",
                                      location: "my-location",
                                      workflow_template: "my-template"
# => "projects/my-project/locations/my-location/workflowTemplates/my-template"

Finally, in the 1.0 client, you can also use the paths module as a convenience module.

New:

# Bring the path helper methods into the current class
include Google::Cloud::Dataproc::V1::WorkflowTemplateService::Paths

def foo
  client = Google::Cloud::Dataproc.workflow_template_service

  # Call the included helper method
  name = workflow_template_path project: "my-project",
                                location: "my-location",
                                workflow_template: "my-template"

  response = client.get_workflow_template name: name

  # Do something with response...
end

Handling Errors

The client reports standard gRPC error codes by raising exceptions. In older releases, these exceptions were located in the Google::Gax namespace and were subclasses of the Google::Gax::GaxError base exception class, defined in the google-gax gem. However, these classes were different from the standard exceptions (subclasses of Google::Cloud::Error) thrown by other client libraries such as google-cloud-storage.

The 1.0 client library now uses the Google::Cloud::Error exception hierarchy, for consistency across all the Google Cloud client libraries. In general, these exceptions have the same name as their counterparts from older releases, but are located in the Google::Cloud namespace rather than the Google::Gax namespace.

Old:

client = Google::Cloud::Dataproc::ClusterController.new

project_id = "my-project"
region = "us-central1"
cluster_name = "my_cluster"

begin
  response = client.get_cluster project_id, region, cluster_name
rescue Google::Gax::Error => e
  # Handle exceptions that subclass Google::Gax::Error
end

New:

client = Google::Cloud::Dataproc.cluster_controller

project_id = "my-project"
region = "us-central1"
cluster_name = "my_cluster"

begin
  response = client.get_cluster project_id: project_id, region: region,
                                cluster_name: cluster_name
rescue Google::Cloud::Error => e
  # Handle exceptions that subclass Google::Cloud::Error
end

Class Namespaces

In older releases, the client object was of classes with names like: Google::Cloud::Dataproc::V1::ClusterControllerClient. In the 1.0 release, the client object is of a different class: Google::Cloud::Dataproc::V1::ClusterController::Client. Note that most users will use the factory methods such as Google::Cloud::Dataproc.cluster_controller to create instances of the client object, so you may not need to reference the actual class directly. See Creating Clients.

In older releases, the credentials object was of class Google::Cloud::Dataproc::V1::Credentials. In the 1.0 release, each service has its own credentials class, e.g. Google::Cloud::Dataproc::V1::ClusterController::Credentials. Again, most users will not need to reference this class directly. See Client Configuration.