Class: BinCircle
Overview
An implementation of consistent hashing (en.wikipedia.org/wiki/Consistent_hashing) using a circular structure of bins.
Synopsis:
circle = BinCircle.new reps: 20
100.times do |id|
circle.add_bin id
end
circle.find_bin "foo"
The #find_bin method determines which bin a given object is associated with. For example: use it to look up which distributed cache the object is stored in.
BinCircle Does not store or cache any data. The state of a BinCircle instance depends entirely on two things: the replications number (either passed to BinCircle#new or set per bin in #add_bin), and the current set of bins, which is managed with #add_bin and #delete_bin. Note: like rbtree, not thread-safe (for #delete and #show_bins).
Constant Summary collapse
- KEY_BITS =
30
- KEY_MAX =
2**KEY_BITS - 1
- DEFAULT_REPS =
10
Instance Attribute Summary collapse
-
#default_reps ⇒ Object
Returns the value of attribute default_reps.
Instance Method Summary collapse
-
#add_bin(id, reps: default_reps) ⇒ Object
id
should be an identifier for your bin, typically a number or string. - #bin_size ⇒ Object
-
#bins ⇒ Object
Returns the set of bin ids now in the circle.
-
#delete_bin(id) ⇒ Object
Delete the bin specified by
id
(same as argument to #add_bin). -
#find_bin(obj) ⇒ Object
id
should be an identifier for your object, typically a number or string. - #find_bin_by_key(key) ⇒ Object
-
#initialize(reps: DEFAULT_REPS) ⇒ BinCircle
constructor
A new instance of BinCircle.
- #inspect ⇒ Object
-
#key_for_string(str) ⇒ Object
This hashing fn is applied to both bin ids and object identifiers (in principle, these could be two different function with the same range of outputs..
-
#show_bins ⇒ Object
To help in tuning the replication counts, you can look at the distribution of bin sizes, plus mean and variance.
Constructor Details
#initialize(reps: DEFAULT_REPS) ⇒ BinCircle
Returns a new instance of BinCircle.
35 36 37 38 |
# File 'lib/tupelo/util/bin-circle.rb', line 35 def initialize reps: DEFAULT_REPS @tree = MultiRBTree.new # hashed_bin_id => bin_id @default_reps = reps end |
Instance Attribute Details
#default_reps ⇒ Object
Returns the value of attribute default_reps.
33 34 35 |
# File 'lib/tupelo/util/bin-circle.rb', line 33 def default_reps @default_reps end |
Instance Method Details
#add_bin(id, reps: default_reps) ⇒ Object
id
should be an identifier for your bin, typically a number or string. Uses id.to_s, so id
should not contain a hash (which would not determine a unique string).
43 44 45 46 47 48 49 50 |
# File 'lib/tupelo/util/bin-circle.rb', line 43 def add_bin id, reps: default_reps rep_id = "#{id} 0000" reps.times do |i| key = key_for_string(rep_id) @tree[key] = id rep_id.succ! end end |
#bin_size ⇒ Object
85 86 87 88 89 90 91 92 |
# File 'lib/tupelo/util/bin-circle.rb', line 85 def bin_size Hash.new(0).tap do |bs| bs[@tree.first[1]] = 2**30 - @tree.last[0] + @tree.first[0] @tree.each_cons(2) do |(prev, _), (key, id)| bs[id] += key - prev end end end |
#bins ⇒ Object
Returns the set of bin ids now in the circle.
53 54 55 |
# File 'lib/tupelo/util/bin-circle.rb', line 53 def bins Set.new @tree.values end |
#delete_bin(id) ⇒ Object
Delete the bin specified by id
(same as argument to #add_bin).
69 70 71 |
# File 'lib/tupelo/util/bin-circle.rb', line 69 def delete_bin id @tree.delete_if {|k,v| v == id} end |
#find_bin(obj) ⇒ Object
id
should be an identifier for your object, typically a number or string. Uses id.to_s, so id
should not contain a hash (which would not determine a unique string). It is not necessary for id
to be unique across objects.
76 77 78 |
# File 'lib/tupelo/util/bin-circle.rb', line 76 def find_bin obj find_bin_by_key(key_for_string(obj.to_s)) end |
#find_bin_by_key(key) ⇒ Object
80 81 82 83 |
# File 'lib/tupelo/util/bin-circle.rb', line 80 def find_bin_by_key key _, id = @tree.lower_bound(key) || @tree.first id end |
#inspect ⇒ Object
57 58 59 |
# File 'lib/tupelo/util/bin-circle.rb', line 57 def inspect "#<#{self.class} bins=#{bins.to_a}>" end |
#key_for_string(str) ⇒ Object
This hashing fn is applied to both bin ids and object identifiers (in principle, these could be two different function with the same range of outputs.
64 65 66 |
# File 'lib/tupelo/util/bin-circle.rb', line 64 def key_for_string str Digest::MD5.hexdigest(str).to_i(16) & KEY_MAX end |
#show_bins ⇒ Object
To help in tuning the replication counts, you can look at the distribution of bin sizes, plus mean and variance.
96 97 98 99 100 101 102 103 104 105 |
# File 'lib/tupelo/util/bin-circle.rb', line 96 def show_bins bs = bin_size p bs mean = bs.inject(0.0) {|sum, (id,n)| sum + n} / N_BINS variance = bs.inject(0.0) {|sum_sqerr, (id,n)| sum_sqerr + (n - mean)**2} / (N_BINS-1) printf "mean : %14d\n" % mean printf "stdev: %14d\n" % Math.sqrt(variance) end |