Class: Kafka::Client

Inherits:

Object

Object
Kafka::Client

show all

Defined in:: lib/kafka/client.rb

Instance Method Summary collapse

#async_producer(delivery_interval: 0, delivery_threshold: 0, max_queue_size: 1000, **options) ⇒ AsyncProducer
Creates a new AsyncProducer instance.
#close ⇒ nil
Closes all connections to the Kafka brokers and frees up used resources.
#consumer(group_id:, session_timeout: 30, offset_commit_interval: 10, offset_commit_threshold: 0, heartbeat_interval: 10, offset_retention_time: nil) ⇒ Consumer
Creates a new Kafka consumer.
#deliver_message(value, key: nil, topic:, partition: nil, partition_key: nil) ⇒ nil
Delivers a single message to the Kafka cluster.
#each_message(topic:, start_from_beginning: true, max_wait_time: 5, min_bytes: 1, max_bytes: 1048576, &block) ⇒ nil
Enumerate all messages in a topic.
#fetch_messages(topic:, partition:, offset: :latest, max_wait_time: 5, min_bytes: 1, max_bytes: 1048576) ⇒ Array<Kafka::FetchedMessage>
Fetches a batch of messages from a single partition.
#initialize(seed_brokers:, client_id: "ruby-kafka", logger: nil, connect_timeout: nil, socket_timeout: nil, ssl_ca_cert_file_path: nil, ssl_ca_cert: nil, ssl_client_cert: nil, ssl_client_cert_key: nil, sasl_gssapi_principal: nil, sasl_gssapi_keytab: nil, sasl_plain_authzid: '', sasl_plain_username: nil, sasl_plain_password: nil) ⇒ Client constructor
Initializes a new Kafka client.
#last_offset_for(topic, partition) ⇒ Integer
Retrieve the offset of the last message in a partition.
#last_offsets_for(*topics) ⇒ Hash<String, Hash<Integer, Integer>>
Retrieve the offset of the last message in each partition of the specified topics.
#partitions_for(topic) ⇒ Integer
Counts the number of partitions in a topic.
#producer(compression_codec: nil, compression_threshold: 1, ack_timeout: 5, required_acks: :all, max_retries: 2, retry_backoff: 1, max_buffer_size: 1000, max_buffer_bytesize: 10_000_000) ⇒ Kafka::Producer
Initializes a new Kafka producer.
#topics ⇒ Array<String>
Lists all topics in the cluster.

Constructor Details

#initialize(seed_brokers:, client_id: "ruby-kafka", logger: nil, connect_timeout: nil, socket_timeout: nil, ssl_ca_cert_file_path: nil, ssl_ca_cert: nil, ssl_client_cert: nil, ssl_client_cert_key: nil, sasl_gssapi_principal: nil, sasl_gssapi_keytab: nil, sasl_plain_authzid: '', sasl_plain_username: nil, sasl_plain_password: nil) ⇒ `Client`

Initializes a new Kafka client.

Parameters:

seed_brokers (Array<String>, String) —
the list of brokers used to initialize the client. Either an Array of connections, or a comma separated string of connections. Connections can either be a string of "port:protocol" or a full URI with a scheme. If there's a scheme it's ignored and only host/port are used.
client_id (String) (defaults to: "ruby-kafka") —
the identifier for this application.
logger (Logger) (defaults to: nil) —
the logger that should be used by the client.
connect_timeout (Integer, nil) (defaults to: nil) —
the timeout setting for connecting to brokers. See BrokerPool#initialize.
socket_timeout (Integer, nil) (defaults to: nil) —
the timeout setting for socket connections. See BrokerPool#initialize.
ssl_ca_cert (String, Array<String>, nil) (defaults to: nil) —
a PEM encoded CA cert, or an Array of PEM encoded CA certs, to use with an SSL connection.
ssl_ca_cert_file_path (String, nil) (defaults to: nil) —
a path on the filesystem to a PEM encoded CA cert to use with an SSL connection.
ssl_client_cert (String, nil) (defaults to: nil) —
a PEM encoded client cert to use with an SSL connection. Must be used in combination with ssl_client_cert_key.
ssl_client_cert_key (String, nil) (defaults to: nil) —
a PEM encoded client cert key to use with an SSL connection. Must be used in combination with ssl_client_cert.
sasl_gssapi_principal (String, nil) (defaults to: nil) —
a KRB5 principal
sasl_gssapi_keytab (String, nil) (defaults to: nil) —
a KRB5 keytab filepath

# File 'lib/kafka/client.rb', line 51

def initialize(seed_brokers:, client_id: "ruby-kafka", logger: nil, connect_timeout: nil, socket_timeout: nil,
               ssl_ca_cert_file_path: nil, ssl_ca_cert: nil, ssl_client_cert: nil, ssl_client_cert_key: nil,
               sasl_gssapi_principal: nil, sasl_gssapi_keytab: nil,
               sasl_plain_authzid: '', sasl_plain_username: nil, sasl_plain_password: nil)
  @logger = logger || Logger.new(nil)
  @instrumenter = Instrumenter.new(client_id: client_id)
  @seed_brokers = normalize_seed_brokers(seed_brokers)

  ssl_context = build_ssl_context(ssl_ca_cert_file_path, ssl_ca_cert, ssl_client_cert, ssl_client_cert_key)

  @connection_builder = ConnectionBuilder.new(
    client_id: client_id,
    connect_timeout: connect_timeout,
    socket_timeout: socket_timeout,
    ssl_context: ssl_context,
    logger: @logger,
    instrumenter: @instrumenter,
    sasl_gssapi_principal: sasl_gssapi_principal,
    sasl_gssapi_keytab: sasl_gssapi_keytab,
    sasl_plain_authzid: sasl_plain_authzid,
    sasl_plain_username: sasl_plain_username,
    sasl_plain_password: sasl_plain_password
  )

  @cluster = initialize_cluster
end

Instance Method Details

#async_producer(delivery_interval: 0, delivery_threshold: 0, max_queue_size: 1000, **options) ⇒ `AsyncProducer`

Creates a new AsyncProducer instance.

All parameters allowed by #producer can be passed. In addition to this, a few extra parameters can be passed when creating an async producer.

Parameters:

max_queue_size (Integer) (defaults to: 1000) —
the maximum number of messages allowed in the queue.
delivery_threshold (Integer) (defaults to: 0) —
if greater than zero, the number of buffered messages that will automatically trigger a delivery.
delivery_interval (Integer) (defaults to: 0) —
if greater than zero, the number of seconds between automatic message deliveries.

Returns:

(AsyncProducer)

#close ⇒ `nil`

Closes all connections to the Kafka brokers and frees up used resources.

Returns:

(nil)



455
456
457

# File 'lib/kafka/client.rb', line 455

def close
  @cluster.disconnect
end

#consumer(group_id:, session_timeout: 30, offset_commit_interval: 10, offset_commit_threshold: 0, heartbeat_interval: 10, offset_retention_time: nil) ⇒ `Consumer`

Creates a new Kafka consumer.

Parameters:

group_id (String) —
the id of the group that the consumer should join.
session_timeout (Integer) (defaults to: 30) —
the number of seconds after which, if a client hasn't contacted the Kafka cluster, it will be kicked out of the group.
offset_commit_interval (Integer) (defaults to: 10) —
the interval between offset commits, in seconds.
offset_commit_threshold (Integer) (defaults to: 0) —
the number of messages that can be processed before their offsets are committed. If zero, offset commits are not triggered by message processing.
heartbeat_interval (Integer) (defaults to: 10) —
the interval between heartbeats; must be less than the session window.
offset_retention_time (Integer) (defaults to: nil) —
the time period that committed offsets will be retained, in seconds. Defaults to the broker setting.

Returns:

(Consumer)

# File 'lib/kafka/client.rb', line 237

def consumer(group_id:, session_timeout: 30, offset_commit_interval: 10, offset_commit_threshold: 0, heartbeat_interval: 10, offset_retention_time: nil)
  cluster = initialize_cluster

  instrumenter = DecoratingInstrumenter.new(@instrumenter, {
    group_id: group_id,
  })

  # The Kafka protocol expects the retention time to be in ms.
  retention_time = (offset_retention_time && offset_retention_time * 1_000) || -1

  group = ConsumerGroup.new(
    cluster: cluster,
    logger: @logger,
    group_id: group_id,
    session_timeout: session_timeout,
    retention_time: retention_time
  )

  offset_manager = OffsetManager.new(
    cluster: cluster,
    group: group,
    logger: @logger,
    commit_interval: offset_commit_interval,
    commit_threshold: offset_commit_threshold,
    offset_retention_time: offset_retention_time
  )

  heartbeat = Heartbeat.new(
    group: group,
    interval: heartbeat_interval,
  )

  Consumer.new(
    cluster: cluster,
    logger: @logger,
    instrumenter: instrumenter,
    group: group,
    offset_manager: offset_manager,
    session_timeout: session_timeout,
    heartbeat: heartbeat,
  )
end

#deliver_message(value, key: nil, topic:, partition: nil, partition_key: nil) ⇒ `nil`

Delivers a single message to the Kafka cluster.

Note: Only use this API for low-throughput scenarios. If you want to deliver many messages at a high rate, or if you want to configure the way messages are sent, use the #producer or #async_producer APIs instead.

Parameters:

value (String, nil) —
the message value.
key (String, nil) (defaults to: nil) —
the message key.
topic (String) —
the topic that the message should be written to.
partition (Integer, nil) (defaults to: nil) —
the partition that the message should be written to, or nil if either partition_key is passed or the partition should be chosen at random.
partition_key (String) (defaults to: nil) —
a value used to deterministically choose a partition to write to.

Returns:

(nil)

# File 'lib/kafka/client.rb', line 93

def deliver_message(value, key: nil, topic:, partition: nil, partition_key: nil)
  create_time = Time.now

  message = PendingMessage.new(
    value,
    key,
    topic,
    partition,
    partition_key,
    create_time,
    key.to_s.bytesize + value.to_s.bytesize
  )

  if partition.nil?
    partition_count = @cluster.partitions_for(topic).count
    partition = Partitioner.partition_for_key(partition_count, message)
  end

  buffer = MessageBuffer.new

  buffer.write(
    value: message.value,
    key: message.key,
    topic: message.topic,
    partition: partition,
    create_time: message.create_time,
  )

  @cluster.add_target_topics([topic])

  compressor = Compressor.new(
    instrumenter: @instrumenter,
  )

  operation = ProduceOperation.new(
    cluster: @cluster,
    buffer: buffer,
    required_acks: 1,
    ack_timeout: 10,
    compressor: compressor,
    logger: @logger,
    instrumenter: @instrumenter,
  )

  operation.execute

  unless buffer.empty?
    raise DeliveryFailed
  end
end

#each_message(topic:, start_from_beginning: true, max_wait_time: 5, min_bytes: 1, max_bytes: 1048576, &block) ⇒ `nil`

Enumerate all messages in a topic.

Parameters:

topic (String) —
the topic to consume messages from.
start_from_beginning (Boolean) (defaults to: true) —
whether to start from the beginning of the topic or just subscribe to new messages being produced. This only applies when first consuming a topic partition – once the consumer has checkpointed its progress, it will always resume from the last checkpoint.
max_wait_time (Integer) (defaults to: 5) —
the maximum amount of time to wait before the server responds, in seconds.
min_bytes (Integer) (defaults to: 1) —
the minimum number of bytes to wait for. If set to zero, the broker will respond immediately, but the response may be empty. The default is 1 byte, which means that the broker will respond as soon as a message is written to the partition.
max_bytes (Integer) (defaults to: 1048576) —
the maximum number of bytes to include in the response message set. Default is 1 MB. You need to set this higher if you expect messages to be larger than this.

Returns:

(nil)

# File 'lib/kafka/client.rb', line 377

def each_message(topic:, start_from_beginning: true, max_wait_time: 5, min_bytes: 1, max_bytes: 1048576, &block)
  default_offset ||= start_from_beginning ? :earliest : :latest
  offsets = Hash.new { default_offset }

  loop do
    operation = FetchOperation.new(
      cluster: @cluster,
      logger: @logger,
      min_bytes: min_bytes,
      max_wait_time: max_wait_time,
    )

    @cluster.partitions_for(topic).map(&:partition_id).each do |partition|
      partition_offset = offsets[partition]
      operation.fetch_from_partition(topic, partition, offset: partition_offset, max_bytes: max_bytes)
    end

    batches = operation.execute

    batches.each do |batch|
      batch.messages.each(&block)
      offsets[batch.partition] = batch.last_offset + 1
    end
  end
end

#fetch_messages(topic:, partition:, offset: :latest, max_wait_time: 5, min_bytes: 1, max_bytes: 1048576) ⇒ `Array<Kafka::FetchedMessage>`

Note:

This API is still alpha level. Don't try to use it in production.

Fetches a batch of messages from a single partition. Note that it's possible to get back empty batches.

The starting point for the fetch can be configured with the :offset argument. If you pass a number, the fetch will start at that offset. However, there are two special Symbol values that can be passed instead:

:earliest — the first offset in the partition.
:latest — the next offset that will be written to, effectively making the call block until there is a new message in the partition.

The Kafka protocol specifies the numeric values of these two options: -2 and -1, respectively. You can also pass in these numbers directly.

Example

When enumerating the messages in a partition, you typically fetch batches sequentially.

offset = :earliest

loop do
  messages = kafka.fetch_messages(
    topic: "my-topic",
    partition: 42,
    offset: offset,
  )

  messages.each do |message|
    puts message.offset, message.key, message.value

    # Set the next offset that should be read to be the subsequent
    # offset.
    offset = message.offset + 1
  end
end

See a working example in examples/simple-consumer.rb.

Parameters:

topic (String) —
the topic that messages should be fetched from.
partition (Integer) —
the partition that messages should be fetched from.
offset (Integer, Symbol) (defaults to: :latest) —
the offset to start reading from. Default is the latest offset.
max_wait_time (Integer) (defaults to: 5) —
the maximum amount of time to wait before the server responds, in seconds.
min_bytes (Integer) (defaults to: 1) —
the minimum number of bytes to wait for. If set to zero, the broker will respond immediately, but the response may be empty. The default is 1 byte, which means that the broker will respond as soon as a message is written to the partition.
max_bytes (Integer) (defaults to: 1048576) —
the maximum number of bytes to include in the response message set. Default is 1 MB. You need to set this higher if you expect messages to be larger than this.

Returns:

(Array<Kafka::FetchedMessage>) —
the messages returned from the broker.

# File 'lib/kafka/client.rb', line 341

def fetch_messages(topic:, partition:, offset: :latest, max_wait_time: 5, min_bytes: 1, max_bytes: 1048576)
  operation = FetchOperation.new(
    cluster: @cluster,
    logger: @logger,
    min_bytes: min_bytes,
    max_wait_time: max_wait_time,
  )

  operation.fetch_from_partition(topic, partition, offset: offset, max_bytes: max_bytes)

  operation.execute.flat_map {|batch| batch.messages }
end

#last_offset_for(topic, partition) ⇒ `Integer`

Retrieve the offset of the last message in a partition. If there are no messages in the partition -1 is returned.

Parameters:

topic (String)
partition (Integer)

Returns:

(Integer) —
the offset of the last message in the partition, or -1 if there are no messages in the partition.

# File 'lib/kafka/client.rb', line 426

def last_offset_for(topic, partition)
  # The offset resolution API will return the offset of the "next" message to
  # be written when resolving the "latest" offset, so we subtract one.
  @cluster.resolve_offset(topic, partition, :latest) - 1
end

#last_offsets_for(*topics) ⇒ `Hash<String, Hash<Integer, Integer>>`

Retrieve the offset of the last message in each partition of the specified topics.

Examples:

last_offsets_for('topic-1', 'topic-2') # =>
# {
#   'topic-1' => { 0 => 100, 1 => 100 },
#   'topic-2' => { 0 => 100, 1 => 100 }
# }

Parameters:

topics (Array<String>) —
topic names.

Returns:

(Hash<String, Hash<Integer, Integer>>)

# File 'lib/kafka/client.rb', line 443

def last_offsets_for(*topics)
  @cluster.add_target_topics(topics)
  topics.map {|topic|
    partition_ids = @cluster.partitions_for(topic).collect(&:partition_id)
    partition_offsets = @cluster.resolve_offsets(topic, partition_ids, :latest)
    [topic, partition_offsets.collect { |k, v| [k, v - 1] }.to_h]
  }.to_h
end

#partitions_for(topic) ⇒ `Integer`

Counts the number of partitions in a topic.

Parameters:

topic (String)

Returns:

(Integer) —
the number of partitions in the topic.



415
416
417

# File 'lib/kafka/client.rb', line 415

def partitions_for(topic)
  @cluster.partitions_for(topic).count
end

#producer(compression_codec: nil, compression_threshold: 1, ack_timeout: 5, required_acks: :all, max_retries: 2, retry_backoff: 1, max_buffer_size: 1000, max_buffer_bytesize: 10_000_000) ⇒ `Kafka::Producer`

Initializes a new Kafka producer.

Parameters:

ack_timeout (Integer) (defaults to: 5) —
The number of seconds a broker can wait for replicas to acknowledge a write before responding with a timeout.
required_acks (Integer, Symbol) (defaults to: :all) —
The number of replicas that must acknowledge a write, or :all if all in-sync replicas must acknowledge.
max_retries (Integer) (defaults to: 2) —
the number of retries that should be attempted before giving up sending messages to the cluster. Does not include the original attempt.
retry_backoff (Integer) (defaults to: 1) —
the number of seconds to wait between retries.
max_buffer_size (Integer) (defaults to: 1000) —
the number of messages allowed in the buffer before new writes will raise BufferOverflow exceptions.
max_buffer_bytesize (Integer) (defaults to: 10_000_000) —
the maximum size of the buffer in bytes. attempting to produce messages when the buffer reaches this size will result in BufferOverflow being raised.
compression_codec (Symbol, nil) (defaults to: nil) —
the name of the compression codec to use, or nil if no compression should be performed. Valid codecs: :snappy and :gzip.
compression_threshold (Integer) (defaults to: 1) —
the number of messages that needs to be in a message set before it should be compressed. Note that message sets are per-partition rather than per-topic or per-producer.

Returns:

(Kafka::Producer) —
the Kafka producer.

# File 'lib/kafka/client.rb', line 174

def producer(compression_codec: nil, compression_threshold: 1, ack_timeout: 5, required_acks: :all, max_retries: 2, retry_backoff: 1, max_buffer_size: 1000, max_buffer_bytesize: 10_000_000)
  compressor = Compressor.new(
    codec_name: compression_codec,
    threshold: compression_threshold,
    instrumenter: @instrumenter,
  )

  Producer.new(
    cluster: initialize_cluster,
    logger: @logger,
    instrumenter: @instrumenter,
    compressor: compressor,
    ack_timeout: ack_timeout,
    required_acks: required_acks,
    max_retries: max_retries,
    retry_backoff: retry_backoff,
    max_buffer_size: max_buffer_size,
    max_buffer_bytesize: max_buffer_bytesize,
  )
end

#topics ⇒ `Array<String>`

Lists all topics in the cluster.

Returns:

(Array<String>) —
the list of topic names.

# File 'lib/kafka/client.rb', line 406

def topics
  @cluster.clear_target_topics
  @cluster.topics
end

Class: Kafka::Client

Instance Method Summary collapse

Constructor Details

Instance Method Details

#async_producer(delivery_interval: 0, delivery_threshold: 0, max_queue_size: 1000, **options) ⇒ AsyncProducer

#close ⇒ nil

#consumer(group_id:, session_timeout: 30, offset_commit_interval: 10, offset_commit_threshold: 0, heartbeat_interval: 10, offset_retention_time: nil) ⇒ Consumer

#deliver_message(value, key: nil, topic:, partition: nil, partition_key: nil) ⇒ nil

#each_message(topic:, start_from_beginning: true, max_wait_time: 5, min_bytes: 1, max_bytes: 1048576, &block) ⇒ nil

#fetch_messages(topic:, partition:, offset: :latest, max_wait_time: 5, min_bytes: 1, max_bytes: 1048576) ⇒ Array<Kafka::FetchedMessage>

Example

#last_offset_for(topic, partition) ⇒ Integer

#last_offsets_for(*topics) ⇒ Hash<String, Hash<Integer, Integer>>

#partitions_for(topic) ⇒ Integer

#producer(compression_codec: nil, compression_threshold: 1, ack_timeout: 5, required_acks: :all, max_retries: 2, retry_backoff: 1, max_buffer_size: 1000, max_buffer_bytesize: 10_000_000) ⇒ Kafka::Producer

#topics ⇒ Array<String>

#async_producer(delivery_interval: 0, delivery_threshold: 0, max_queue_size: 1000, **options) ⇒ `AsyncProducer`

#close ⇒ `nil`

#consumer(group_id:, session_timeout: 30, offset_commit_interval: 10, offset_commit_threshold: 0, heartbeat_interval: 10, offset_retention_time: nil) ⇒ `Consumer`

#deliver_message(value, key: nil, topic:, partition: nil, partition_key: nil) ⇒ `nil`

#each_message(topic:, start_from_beginning: true, max_wait_time: 5, min_bytes: 1, max_bytes: 1048576, &block) ⇒ `nil`

#fetch_messages(topic:, partition:, offset: :latest, max_wait_time: 5, min_bytes: 1, max_bytes: 1048576) ⇒ `Array<Kafka::FetchedMessage>`

#last_offset_for(topic, partition) ⇒ `Integer`

#last_offsets_for(*topics) ⇒ `Hash<String, Hash<Integer, Integer>>`

#partitions_for(topic) ⇒ `Integer`

#producer(compression_codec: nil, compression_threshold: 1, ack_timeout: 5, required_acks: :all, max_retries: 2, retry_backoff: 1, max_buffer_size: 1000, max_buffer_bytesize: 10_000_000) ⇒ `Kafka::Producer`

#topics ⇒ `Array<String>`