Class: Kafka::Client

Inherits:
Object
  • Object
show all
Defined in:
lib/kafka/client.rb

Instance Method Summary collapse

Constructor Details

#initialize(seed_brokers:, client_id: "ruby-kafka", logger: nil, connect_timeout: nil, socket_timeout: nil, ssl_ca_cert: nil, ssl_client_cert: nil, ssl_client_cert_key: nil) ⇒ Client

Initializes a new Kafka client.

Parameters:

  • seed_brokers (Array<String>, String)

    the list of brokers used to initialize the client. Either an Array of connections, or a comma separated string of connections. Connections can either be a string of "port:protocol" or a full URI with a scheme. If there's a scheme it's ignored and only host/port are used.

  • client_id (String) (defaults to: "ruby-kafka")

    the identifier for this application.

  • logger (Logger) (defaults to: nil)

    the logger that should be used by the client.

  • connect_timeout (Integer, nil) (defaults to: nil)

    the timeout setting for connecting to brokers. See BrokerPool#initialize.

  • socket_timeout (Integer, nil) (defaults to: nil)

    the timeout setting for socket connections. See BrokerPool#initialize.

  • ssl_ca_cert (String, nil) (defaults to: nil)

    a PEM encoded CA cert to use with an SSL connection.

  • ssl_client_cert (String, nil) (defaults to: nil)

    a PEM encoded client cert to use with an SSL connection. Must be used in combination with ssl_client_cert_key.

  • ssl_client_cert_key (String, nil) (defaults to: nil)

    a PEM encoded client cert key to use with an SSL connection. Must be used in combination with ssl_client_cert.



44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# File 'lib/kafka/client.rb', line 44

def initialize(seed_brokers:, client_id: "ruby-kafka", logger: nil, connect_timeout: nil, socket_timeout: nil, ssl_ca_cert: nil, ssl_client_cert: nil, ssl_client_cert_key: nil)
  @logger = logger || Logger.new(nil)
  @instrumenter = Instrumenter.new(client_id: client_id)
  @seed_brokers = normalize_seed_brokers(seed_brokers)

  ssl_context = build_ssl_context(ssl_ca_cert, ssl_client_cert, ssl_client_cert_key)

  @connection_builder = ConnectionBuilder.new(
    client_id: client_id,
    connect_timeout: connect_timeout,
    socket_timeout: socket_timeout,
    ssl_context: ssl_context,
    logger: @logger,
    instrumenter: @instrumenter,
  )

  @cluster = initialize_cluster
end

Instance Method Details

#async_producer(delivery_interval: 0, delivery_threshold: 0, max_queue_size: 1000, **options) ⇒ AsyncProducer

Creates a new AsyncProducer instance.

All parameters allowed by #producer can be passed. In addition to this, a few extra parameters can be passed when creating an async producer.

Parameters:

  • max_queue_size (Integer) (defaults to: 1000)

    the maximum number of messages allowed in the queue.

  • delivery_threshold (Integer) (defaults to: 0)

    if greater than zero, the number of buffered messages that will automatically trigger a delivery.

  • delivery_interval (Integer) (defaults to: 0)

    if greater than zero, the number of seconds between automatic message deliveries.

Returns:

See Also:



179
180
181
182
183
184
185
186
187
188
189
# File 'lib/kafka/client.rb', line 179

def async_producer(delivery_interval: 0, delivery_threshold: 0, max_queue_size: 1000, **options)
  sync_producer = producer(**options)

  AsyncProducer.new(
    sync_producer: sync_producer,
    delivery_interval: delivery_interval,
    delivery_threshold: delivery_threshold,
    max_queue_size: max_queue_size,
    instrumenter: @instrumenter,
  )
end

#closenil

Closes all connections to the Kafka brokers and frees up used resources.

Returns:

  • (nil)


397
398
399
# File 'lib/kafka/client.rb', line 397

def close
  @cluster.disconnect
end

#consumer(group_id:, session_timeout: 30, offset_commit_interval: 10, offset_commit_threshold: 0, heartbeat_interval: 10) ⇒ Consumer

Creates a new Kafka consumer.

Parameters:

  • group_id (String)

    the id of the group that the consumer should join.

  • session_timeout (Integer) (defaults to: 30)

    the number of seconds after which, if a client hasn't contacted the Kafka cluster, it will be kicked out of the group.

  • offset_commit_interval (Integer) (defaults to: 10)

    the interval between offset commits, in seconds.

  • offset_commit_threshold (Integer) (defaults to: 0)

    the number of messages that can be processed before their offsets are committed. If zero, offset commits are not triggered by message processing.

  • heartbeat_interval (Integer) (defaults to: 10)

    the interval between heartbeats; must be less than the session window.

Returns:



204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
# File 'lib/kafka/client.rb', line 204

def consumer(group_id:, session_timeout: 30, offset_commit_interval: 10, offset_commit_threshold: 0, heartbeat_interval: 10)
  cluster = initialize_cluster

  instrumenter = DecoratingInstrumenter.new(@instrumenter, {
    group_id: group_id,
  })

  group = ConsumerGroup.new(
    cluster: cluster,
    logger: @logger,
    group_id: group_id,
    session_timeout: session_timeout,
  )

  offset_manager = OffsetManager.new(
    cluster: cluster,
    group: group,
    logger: @logger,
    commit_interval: offset_commit_interval,
    commit_threshold: offset_commit_threshold,
  )

  heartbeat = Heartbeat.new(
    group: group,
    interval: heartbeat_interval,
  )

  Consumer.new(
    cluster: cluster,
    logger: @logger,
    instrumenter: instrumenter,
    group: group,
    offset_manager: offset_manager,
    session_timeout: session_timeout,
    heartbeat: heartbeat,
  )
end

#deliver_message(value, key: nil, topic:, partition: nil, partition_key: nil) ⇒ Object



63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
# File 'lib/kafka/client.rb', line 63

def deliver_message(value, key: nil, topic:, partition: nil, partition_key: nil)
  create_time = Time.now

  message = PendingMessage.new(
    value,
    key,
    topic,
    partition,
    partition_key,
    create_time,
    key.to_s.bytesize + value.to_s.bytesize
  )

  if partition.nil?
    partition_count = @cluster.partitions_for(topic).count
    partition = Partitioner.partition_for_key(partition_count, message)
  end

  buffer = MessageBuffer.new

  buffer.write(
    value: message.value,
    key: message.key,
    topic: message.topic,
    partition: partition,
    create_time: message.create_time,
  )

  @cluster.add_target_topics([topic])

  compressor = Compressor.new(
    instrumenter: @instrumenter,
  )

  operation = ProduceOperation.new(
    cluster: @cluster,
    buffer: buffer,
    required_acks: 1,
    ack_timeout: 10,
    compressor: compressor,
    logger: @logger,
    instrumenter: @instrumenter,
  )

  operation.execute

  unless buffer.empty?
    raise DeliveryFailed
  end
end

#each_message(topic:, start_from_beginning: true, max_wait_time: 5, min_bytes: 1, max_bytes: 1048576, &block) ⇒ nil

Enumerate all messages in a topic.

Parameters:

  • topic (String)

    the topic to consume messages from.

  • start_from_beginning (Boolean) (defaults to: true)

    whether to start from the beginning of the topic or just subscribe to new messages being produced. This only applies when first consuming a topic partition – once the consumer has checkpointed its progress, it will always resume from the last checkpoint.

  • max_wait_time (Integer) (defaults to: 5)

    the maximum amount of time to wait before the server responds, in seconds.

  • min_bytes (Integer) (defaults to: 1)

    the minimum number of bytes to wait for. If set to zero, the broker will respond immediately, but the response may be empty. The default is 1 byte, which means that the broker will respond as soon as a message is written to the partition.

  • max_bytes (Integer) (defaults to: 1048576)

    the maximum number of bytes to include in the response message set. Default is 1 MB. You need to set this higher if you expect messages to be larger than this.

Returns:

  • (nil)


339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
# File 'lib/kafka/client.rb', line 339

def each_message(topic:, start_from_beginning: true, max_wait_time: 5, min_bytes: 1, max_bytes: 1048576, &block)
  default_offset ||= start_from_beginning ? :earliest : :latest
  offsets = Hash.new { default_offset }

  loop do
    operation = FetchOperation.new(
      cluster: @cluster,
      logger: @logger,
      min_bytes: min_bytes,
      max_wait_time: max_wait_time,
    )

    @cluster.partitions_for(topic).map(&:partition_id).each do |partition|
      partition_offset = offsets[partition]
      operation.fetch_from_partition(topic, partition, offset: partition_offset, max_bytes: max_bytes)
    end

    batches = operation.execute

    batches.each do |batch|
      batch.messages.each(&block)
      offsets[batch.partition] = batch.last_offset
    end
  end
end

#fetch_messages(topic:, partition:, offset: :latest, max_wait_time: 5, min_bytes: 1, max_bytes: 1048576) ⇒ Array<Kafka::FetchedMessage>

Note:

This API is still alpha level. Don't try to use it in production.

Fetches a batch of messages from a single partition. Note that it's possible to get back empty batches.

The starting point for the fetch can be configured with the :offset argument. If you pass a number, the fetch will start at that offset. However, there are two special Symbol values that can be passed instead:

  • :earliest — the first offset in the partition.
  • :latest — the next offset that will be written to, effectively making the call block until there is a new message in the partition.

The Kafka protocol specifies the numeric values of these two options: -2 and -1, respectively. You can also pass in these numbers directly.

Example

When enumerating the messages in a partition, you typically fetch batches sequentially.

offset = :earliest

loop do
  messages = kafka.fetch_messages(
    topic: "my-topic",
    partition: 42,
    offset: offset,
  )

  messages.each do |message|
    puts message.offset, message.key, message.value

    # Set the next offset that should be read to be the subsequent
    # offset.
    offset = message.offset + 1
  end
end

See a working example in examples/simple-consumer.rb.

Parameters:

  • topic (String)

    the topic that messages should be fetched from.

  • partition (Integer)

    the partition that messages should be fetched from.

  • offset (Integer, Symbol) (defaults to: :latest)

    the offset to start reading from. Default is the latest offset.

  • max_wait_time (Integer) (defaults to: 5)

    the maximum amount of time to wait before the server responds, in seconds.

  • min_bytes (Integer) (defaults to: 1)

    the minimum number of bytes to wait for. If set to zero, the broker will respond immediately, but the response may be empty. The default is 1 byte, which means that the broker will respond as soon as a message is written to the partition.

  • max_bytes (Integer) (defaults to: 1048576)

    the maximum number of bytes to include in the response message set. Default is 1 MB. You need to set this higher if you expect messages to be larger than this.

Returns:



303
304
305
306
307
308
309
310
311
312
313
314
# File 'lib/kafka/client.rb', line 303

def fetch_messages(topic:, partition:, offset: :latest, max_wait_time: 5, min_bytes: 1, max_bytes: 1048576)
  operation = FetchOperation.new(
    cluster: @cluster,
    logger: @logger,
    min_bytes: min_bytes,
    max_wait_time: max_wait_time,
  )

  operation.fetch_from_partition(topic, partition, offset: offset, max_bytes: max_bytes)

  operation.execute.flat_map {|batch| batch.messages }
end

#last_offset_for(topic, partition) ⇒ Integer

Retrieve the offset of the last message in a partition. If there are no messages in the partition -1 is returned.

Parameters:

  • topic (String)
  • partition (Integer)

Returns:

  • (Integer)

    the offset of the last message in the partition, or -1 if there are no messages in the partition.



388
389
390
391
392
# File 'lib/kafka/client.rb', line 388

def last_offset_for(topic, partition)
  # The offset resolution API will return the offset of the "next" message to
  # be written when resolving the "latest" offset, so we subtract one.
  @cluster.resolve_offset(topic, partition, :latest) - 1
end

#partitions_for(topic) ⇒ Integer

Counts the number of partitions in a topic.

Parameters:

  • topic (String)

Returns:

  • (Integer)

    the number of partitions in the topic.



377
378
379
# File 'lib/kafka/client.rb', line 377

def partitions_for(topic)
  @cluster.partitions_for(topic).count
end

#producer(compression_codec: nil, compression_threshold: 1, ack_timeout: 5, required_acks: :all, max_retries: 2, retry_backoff: 1, max_buffer_size: 1000, max_buffer_bytesize: 10_000_000) ⇒ Kafka::Producer

Initializes a new Kafka producer.

Parameters:

  • ack_timeout (Integer) (defaults to: 5)

    The number of seconds a broker can wait for replicas to acknowledge a write before responding with a timeout.

  • required_acks (Integer, Symbol) (defaults to: :all)

    The number of replicas that must acknowledge a write, or :all if all in-sync replicas must acknowledge.

  • max_retries (Integer) (defaults to: 2)

    the number of retries that should be attempted before giving up sending messages to the cluster. Does not include the original attempt.

  • retry_backoff (Integer) (defaults to: 1)

    the number of seconds to wait between retries.

  • max_buffer_size (Integer) (defaults to: 1000)

    the number of messages allowed in the buffer before new writes will raise BufferOverflow exceptions.

  • max_buffer_bytesize (Integer) (defaults to: 10_000_000)

    the maximum size of the buffer in bytes. attempting to produce messages when the buffer reaches this size will result in BufferOverflow being raised.

  • compression_codec (Symbol, nil) (defaults to: nil)

    the name of the compression codec to use, or nil if no compression should be performed. Valid codecs: :snappy and :gzip.

  • compression_threshold (Integer) (defaults to: 1)

    the number of messages that needs to be in a message set before it should be compressed. Note that message sets are per-partition rather than per-topic or per-producer.

Returns:



144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
# File 'lib/kafka/client.rb', line 144

def producer(compression_codec: nil, compression_threshold: 1, ack_timeout: 5, required_acks: :all, max_retries: 2, retry_backoff: 1, max_buffer_size: 1000, max_buffer_bytesize: 10_000_000)
  compressor = Compressor.new(
    codec_name: compression_codec,
    threshold: compression_threshold,
    instrumenter: @instrumenter,
  )

  Producer.new(
    cluster: initialize_cluster,
    logger: @logger,
    instrumenter: @instrumenter,
    compressor: compressor,
    ack_timeout: ack_timeout,
    required_acks: required_acks,
    max_retries: max_retries,
    retry_backoff: retry_backoff,
    max_buffer_size: max_buffer_size,
    max_buffer_bytesize: max_buffer_bytesize,
  )
end

#topicsArray<String>

Lists all topics in the cluster.

Returns:

  • (Array<String>)

    the list of topic names.



368
369
370
371
# File 'lib/kafka/client.rb', line 368

def topics
  @cluster.clear_target_topics
  @cluster.topics
end