Class: Karafka::Connection::RawMessagesBuffer

Inherits:

Object

Object
Karafka::Connection::RawMessagesBuffer

show all

Defined in:: lib/karafka/connection/raw_messages_buffer.rb

Overview

Note:

This buffer is NOT threadsafe.

Note:

We store data here in groups per topic partition to handle the revocation case, where we may need to remove messages from a single topic partition.

Buffer for raw librdkafka messages.

When message is added to this buffer, it gets assigned to an array with other messages from the same topic and partition.

Instance Attribute Summary collapse

#size ⇒ Object readonly

Returns the value of attribute size.

Instance Method Summary collapse

#<<(message) ⇒ Array<Rdkafka::Consumer::Message>

Adds a message to the buffer.
#clear ⇒ Object

Removes all the data from the buffer.
#delete(topic, partition) ⇒ Object

Removes given topic and partition data out of the buffer This is used when there’s a partition revocation.
#each {|topic, partition, topic| ... } ⇒ Object

Allows to iterate over all the topics and partitions messages.
#initialize ⇒ Karafka::Connection::MessagesBuffer constructor

Buffer instance.
#uniq! ⇒ Object

Removes duplicated messages from the same partitions This should be used only when rebalance occurs, as we may get data again we already have due to the processing from the last offset.

Constructor Details

#initialize ⇒ `Karafka::Connection::MessagesBuffer`

Returns buffer instance.

# File 'lib/karafka/connection/raw_messages_buffer.rb', line 18

def initialize
  @size = 0
  @groups = Hash.new do |topic_groups, topic|
    topic_groups[topic] = Hash.new do |partition_groups, partition|
      partition_groups[partition] = []
    end
  end
end

Instance Attribute Details

#size ⇒ `Object` (readonly)

Returns the value of attribute size.



15
16
17

# File 'lib/karafka/connection/raw_messages_buffer.rb', line 15

def size
  @size
end

Instance Method Details

#<<(message) ⇒ `Array<Rdkafka::Consumer::Message>`

Adds a message to the buffer.

Parameters:

message (Rdkafka::Consumer::Message) —

raw rdkafka message

Returns:

(Array<Rdkafka::Consumer::Message>) —

given partition topic sub-buffer array

# File 'lib/karafka/connection/raw_messages_buffer.rb', line 31

def <<(message)
  @size += 1
  @groups[message.topic][message.partition] << message
end

#clear ⇒ `Object`

Note:

We do not clear the whole groups hash but rather we clear the partition hashes, so we save ourselves some objects allocations. We cannot clear the underlying arrays as they may be used in other threads for data processing, thus if we would clear it, we could potentially clear a raw messages array for a job that is in the jobs queue.

Removes all the data from the buffer.

# File 'lib/karafka/connection/raw_messages_buffer.rb', line 86

def clear
  @size = 0
  @groups.each_value(&:clear)
end

#delete(topic, partition) ⇒ `Object`

Removes given topic and partition data out of the buffer This is used when there’s a partition revocation

Parameters:

topic (String) —

topic we’re interested in
partition (Integer) —

partition of which data we want to remove

# File 'lib/karafka/connection/raw_messages_buffer.rb', line 53

def delete(topic, partition)
  return unless @groups.key?(topic)
  return unless @groups.fetch(topic).key?(partition)

  topic_data = @groups.fetch(topic)
  topic_data.delete(partition)

  recount!

  # If there are no more partitions to handle in a given topic, remove it completely
  @groups.delete(topic) if topic_data.empty?
end

#each {|topic, partition, topic| ... } ⇒ `Object`

Allows to iterate over all the topics and partitions messages

Yield Parameters:

topic (String) —

name
partition (Integer) —

number
topic (Array<Rdkafka::Consumer::Message>) —

partition aggregated results

# File 'lib/karafka/connection/raw_messages_buffer.rb', line 41

def each
  @groups.each do |topic, partitions|
    partitions.each do |partition, messages|
      yield(topic, partition, messages)
    end
  end
end

#uniq! ⇒ `Object`

Removes duplicated messages from the same partitions This should be used only when rebalance occurs, as we may get data again we already have due to the processing from the last offset. In cases like this, we may get same data again and we do want to ensure as few duplications as possible

# File 'lib/karafka/connection/raw_messages_buffer.rb', line 70

def uniq!
  @groups.each_value do |partitions|
    partitions.each_value do |messages|
      messages.uniq!(&:offset)
    end
  end

  recount!
end

Class: Karafka::Connection::RawMessagesBuffer

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize ⇒ Karafka::Connection::MessagesBuffer

Instance Attribute Details

#size ⇒ Object (readonly)

Instance Method Details

#<<(message) ⇒ Array<Rdkafka::Consumer::Message>

#clear ⇒ Object

#delete(topic, partition) ⇒ Object

#each {|topic, partition, topic| ... } ⇒ Object

#uniq! ⇒ Object

#initialize ⇒ `Karafka::Connection::MessagesBuffer`

#size ⇒ `Object` (readonly)

#<<(message) ⇒ `Array<Rdkafka::Consumer::Message>`

#clear ⇒ `Object`

#delete(topic, partition) ⇒ `Object`

#each {|topic, partition, topic| ... } ⇒ `Object`

#uniq! ⇒ `Object`