Class: Bloombroom::ContinuousBloomFilter

Inherits:
Object
  • Object
show all
Defined in:
lib/bloombroom/filter/continuous_bloom_filter.rb

Overview

ContinuousBloomFilter is a bloom filter for unbounded stream of keys where keys are expired over a given period of time. The expected capacity of the bloom filter for the desired validity period must be known or estimated. For a given capacity and error rate, BloomHelper.find_m_k can be used to compute optimal m & k values.

4 bits per key (instead of 1 bit in a normal bloom filter) are used for keeping track of the keys ttl. the internal timer resolution is set to half of the ttl (resolution divisor of 2). using 4 bits gives us 15 usable time slots (slot 0 is for the unset state). basically the internal time bookeeping is similar to a ring buffer where the first timer tick will be time slot=1, slot=2, .. slot=15, slot=1 and so on. The total time of our internal clock will thus be 15 * (ttl / 2). We keep track of ttl by writing the current time slot in the key k buckets when first inserted in the filter. when doing a key lookup if any of the bucket contain the 0 value the key is not found. if the interval betweem the current time slot and any of the k buckets value is greater than 2 (resolution divisor) we know this key is expired and we reset the expired buckets to 0.

Constant Summary collapse

RESOLUTION_DIVISOR =
2
BITS_PER_BUCKET =
4

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(m, k, ttl) ⇒ ContinuousBloomFilter



30
31
32
33
34
35
36
37
38
39
40
41
# File 'lib/bloombroom/filter/continuous_bloom_filter.rb', line 30

def initialize(m, k, ttl)
  @m = m
  @k = k
  @ttl = ttl
  @buckets = BitBucketField.new(BITS_PER_BUCKET, m)

  # time management
  @increment_period = @ttl / RESOLUTION_DIVISOR
  @current_slot = 1
  @max_slot = (2 ** BITS_PER_BUCKET) - 1 # ex. with 4 bits -> we want range 1..15
  @lock = Mutex.new
end

Instance Attribute Details

#bucketsObject (readonly)

Returns the value of attribute buckets.



22
23
24
# File 'lib/bloombroom/filter/continuous_bloom_filter.rb', line 22

def buckets
  @buckets
end

#kObject (readonly)

Returns the value of attribute k.



22
23
24
# File 'lib/bloombroom/filter/continuous_bloom_filter.rb', line 22

def k
  @k
end

#mObject (readonly)

Returns the value of attribute m.



22
23
24
# File 'lib/bloombroom/filter/continuous_bloom_filter.rb', line 22

def m
  @m
end

#ttlObject (readonly)

Returns the value of attribute ttl.



22
23
24
# File 'lib/bloombroom/filter/continuous_bloom_filter.rb', line 22

def ttl
  @ttl
end

Instance Method Details

#add(key) ⇒ ContinuousBloomFilter Also known as: <<



45
46
47
48
49
# File 'lib/bloombroom/filter/continuous_bloom_filter.rb', line 45

def add(key)
  current_slot = @lock.synchronize{@current_slot}
  BloomHelper.multi_hash(key, @k).each{|position| @buckets[position % @m] = current_slot}
  self
end

#inc_time_slotObject

advance internal time slot. this is exposed primarily for spec’ing purposes. normally this is automatically called by the internal timer thread but if not using the internal timer thread it can be called explicitly when doing your own time management.



80
81
82
83
# File 'lib/bloombroom/filter/continuous_bloom_filter.rb', line 80

def inc_time_slot
  # ex. with 4 bits -> we want range 1..15, 
  @lock.synchronize{@current_slot = (@current_slot % @max_slot) + 1}
end

#include?(key) ⇒ Boolean Also known as: []



54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# File 'lib/bloombroom/filter/continuous_bloom_filter.rb', line 54

def include?(key)
  current_slot = @lock.synchronize{@current_slot}
  expired = false

  BloomHelper.multi_hash(key, @k).each do |position| 
    start_slot = @buckets[position % @m]
    if start_slot == 0
      expired = true
    elsif elapsed(start_slot, current_slot) > RESOLUTION_DIVISOR
      expired = true
      @buckets[position % @m] = 0
    end
  end
  !expired
end

#start_timerObject

start the internal timer thread for managing ttls. must be explicitely called



72
73
74
# File 'lib/bloombroom/filter/continuous_bloom_filter.rb', line 72

def start_timer
  @timer ||= detach_timer
end