Class: HyperLogLog
- Inherits:
-
Object
- Object
- HyperLogLog
- Defined in:
- lib/hyper_log_log.rb
Instance Method Summary collapse
- #add(counter_name, value) ⇒ Object
-
#count(counter_name) ⇒ Object
Estimate the cardinality of a single set.
-
#initialize(redis, b = 10) ⇒ HyperLogLog
constructor
A new instance of HyperLogLog.
-
#intersection(*counter_names) ⇒ Object
Estimate the cardinality of the intersection of several sets.
- #raw_union(counter_names) ⇒ Object
-
#rho(i) ⇒ Object
rho(i) is the position of the first 1 in the binary representation of i, reading from most significant to least significant bits.
-
#union(*counter_names) ⇒ Object
Estimate the cardinality of the union of several sets.
- #union_helper(counter_names) ⇒ Object
-
#union_store(destination, *counter_names) ⇒ Object
Store the union of several sets in destination so that it can be used as a HyperLogLog counter later.
Constructor Details
#initialize(redis, b = 10) ⇒ HyperLogLog
Returns a new instance of HyperLogLog.
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# File 'lib/hyper_log_log.rb', line 5 def initialize(redis, b=10) raise "Accuracy not supported. Please choose a value of b between 4 and 16" if b < 4 || b > 16 @redis = redis @bits_in_hash = 32 - b @m = (2 ** b).to_i if @m == 16 @alpha = 0.673 elsif @m == 32 @alpha = 0.697 elsif @m == 64 @alpha = 0.709 else @alpha = 0.7213/(1 + 1.079/@m) end end |
Instance Method Details
#add(counter_name, value) ⇒ Object
21 22 23 24 25 26 27 28 |
# File 'lib/hyper_log_log.rb', line 21 def add(counter_name, value) hash = MurmurHash3::V32.murmur3_32_str_hash(value) function_name = hash % @m w = hash / @m existing_value = (@redis.hget(counter_name, function_name) || 0).to_i new_value = [existing_value, rho(w)].max @redis.hset(counter_name, function_name, new_value) if new_value > existing_value end |
#count(counter_name) ⇒ Object
Estimate the cardinality of a single set
31 32 33 |
# File 'lib/hyper_log_log.rb', line 31 def count(counter_name) union_helper([counter_name]) end |
#intersection(*counter_names) ⇒ Object
Estimate the cardinality of the intersection of several sets. We do this by using the principle of inclusion and exclusion to represent the size of the intersection as the alternating sum of an exponential number of cardinalities of unions of smaller sets.
52 53 54 55 56 57 58 59 |
# File 'lib/hyper_log_log.rb', line 52 def intersection(*counter_names) icount = (1..counter_names.length).map do |k| counter_names.combination(k).map do |group| ((k % 2 == 0) ? -1 : 1) * union_helper(group) end.inject(0, :+) end.inject(0, :+) [icount, 0].max end |
#raw_union(counter_names) ⇒ Object
78 79 80 81 82 83 |
# File 'lib/hyper_log_log.rb', line 78 def raw_union(counter_names) counter_names.map{ |counter_name| @redis.hgetall(counter_name).map{ |x,y| [x, y.to_i] } } .reduce(:concat) .group_by{ |key, count| key } .map{ |key, counters| [key, counters.map{ |x| x.last }.max] } end |
#rho(i) ⇒ Object
rho(i) is the position of the first 1 in the binary representation of i, reading from most significant to least significant bits. Some examples: rho(1…) = 1, rho(001…) = 3, rho(000…0) = @bits_in_hash + 1
88 89 90 91 92 93 94 |
# File 'lib/hyper_log_log.rb', line 88 def rho(i) if i == 0 @bits_in_hash + 1 else @bits_in_hash - Math.log(i, 2).floor end end |
#union(*counter_names) ⇒ Object
Estimate the cardinality of the union of several sets
36 37 38 |
# File 'lib/hyper_log_log.rb', line 36 def union(*counter_names) union_helper(counter_names) end |
#union_helper(counter_names) ⇒ Object
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
# File 'lib/hyper_log_log.rb', line 61 def union_helper(counter_names) all_estimates = raw_union(counter_names).map{ |value, score| 2 ** -score } estimate_sum = all_estimates.reduce(:+) || 0 estimate = @alpha * @m * @m * ((estimate_sum + @m - all_estimates.length) ** -1) if estimate <= 2.5 * @m if all_estimates.length == @m estimate.round else # Correction for small sets (@m * Math.log(Float(@m)/(@m - all_estimates.length))).round end elsif estimate <= 2 ** 32 / 30.0 estimate.round else # Correction for large sets (-2**32 * Math.log(1 - estimate/(2.0**32))).round end end |
#union_store(destination, *counter_names) ⇒ Object
Store the union of several sets in destination so that it can be used as a HyperLogLog counter later.
42 43 44 45 46 |
# File 'lib/hyper_log_log.rb', line 42 def union_store(destination, *counter_names) raw_union(counter_names).each do |key, count| @redis.hset(destination, key, count) end end |