Class: BinCircle

Inherits:
Object show all
Defined in:
lib/tupelo/util/bin-circle.rb

Overview

An implementation of consistent hashing (en.wikipedia.org/wiki/Consistent_hashing) using a circular structure of bins.

Synopsis:

circle = BinCircle.new reps: 20

100.times do |id|
  circle.add_bin id
end

circle.find_bin "foo"

The #find_bin method determines which bin a given object is associated with. For example: use it to look up which distributed cache the object is stored in.

BinCircle Does not store or cache any data. The state of a BinCircle instance depends entirely on two things: the replications number (either passed to BinCircle#new or set per bin in #add_bin), and the current set of bins, which is managed with #add_bin and #delete_bin. Note: like rbtree, not thread-safe (for #delete and #show_bins).

Constant Summary collapse

KEY_BITS =
30
KEY_MAX =
2**KEY_BITS - 1
DEFAULT_REPS =
10

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(reps: DEFAULT_REPS) ⇒ BinCircle

Returns a new instance of BinCircle.



35
36
37
38
# File 'lib/tupelo/util/bin-circle.rb', line 35

def initialize reps: DEFAULT_REPS
  @tree = MultiRBTree.new # hashed_bin_id => bin_id
  @default_reps = reps
end

Instance Attribute Details

#default_repsObject

Returns the value of attribute default_reps.



33
34
35
# File 'lib/tupelo/util/bin-circle.rb', line 33

def default_reps
  @default_reps
end

Instance Method Details

#add_bin(id, reps: default_reps) ⇒ Object

id should be an identifier for your bin, typically a number or string. Uses id.to_s, so id should not contain a hash (which would not determine a unique string).



43
44
45
46
47
48
49
50
# File 'lib/tupelo/util/bin-circle.rb', line 43

def add_bin id, reps: default_reps
  rep_id = "#{id} 0000"
  reps.times do |i|
    key = key_for_string(rep_id)
    @tree[key] = id
    rep_id.succ!
  end
end

#bin_sizeObject



85
86
87
88
89
90
91
92
# File 'lib/tupelo/util/bin-circle.rb', line 85

def bin_size
  Hash.new(0).tap do |bs|
    bs[@tree.first[1]] = 2**30 - @tree.last[0] + @tree.first[0]
    @tree.each_cons(2) do |(prev, _), (key, id)|
      bs[id] += key - prev
    end
  end
end

#binsObject

Returns the set of bin ids now in the circle.



53
54
55
# File 'lib/tupelo/util/bin-circle.rb', line 53

def bins
  Set.new @tree.values
end

#delete_bin(id) ⇒ Object

Delete the bin specified by id (same as argument to #add_bin).



69
70
71
# File 'lib/tupelo/util/bin-circle.rb', line 69

def delete_bin id
  @tree.delete_if {|k,v| v == id}
end

#find_bin(obj) ⇒ Object

id should be an identifier for your object, typically a number or string. Uses id.to_s, so id should not contain a hash (which would not determine a unique string). It is not necessary for id to be unique across objects.



76
77
78
# File 'lib/tupelo/util/bin-circle.rb', line 76

def find_bin obj
  find_bin_by_key(key_for_string(obj.to_s))
end

#find_bin_by_key(key) ⇒ Object



80
81
82
83
# File 'lib/tupelo/util/bin-circle.rb', line 80

def find_bin_by_key key
  _, id = @tree.lower_bound(key) || @tree.first
  id
end

#inspectObject



57
58
59
# File 'lib/tupelo/util/bin-circle.rb', line 57

def inspect
  "#<#{self.class} bins=#{bins.to_a}>"
end

#key_for_string(str) ⇒ Object

This hashing fn is applied to both bin ids and object identifiers (in principle, these could be two different function with the same range of outputs.



64
65
66
# File 'lib/tupelo/util/bin-circle.rb', line 64

def key_for_string str
  Digest::MD5.hexdigest(str).to_i(16) & KEY_MAX
end

#show_binsObject

To help in tuning the replication counts, you can look at the distribution of bin sizes, plus mean and variance.



96
97
98
99
100
101
102
103
104
105
# File 'lib/tupelo/util/bin-circle.rb', line 96

def show_bins
  bs = bin_size
  p bs

  mean = bs.inject(0.0) {|sum, (id,n)| sum + n} / N_BINS
  variance = bs.inject(0.0) {|sum_sqerr, (id,n)|
    sum_sqerr + (n - mean)**2} / (N_BINS-1)
  printf "mean : %14d\n" % mean
  printf "stdev: %14d\n" % Math.sqrt(variance)
end