Class: HyperLogLog

Inherits:
Object
  • Object
show all
Defined in:
lib/hyper_log_log.rb

Instance Method Summary collapse

Constructor Details

#initialize(redis, b = 10) ⇒ HyperLogLog

Returns a new instance of HyperLogLog.



5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# File 'lib/hyper_log_log.rb', line 5

def initialize(redis, b=10)
  raise "Accuracy not supported. Please choose a value of b between 4 and 16" if b < 4 || b > 16
  @redis = redis
  @bits_in_hash = 32 - b
  @m = (2 ** b).to_i
  if @m == 16
    @alpha = 0.673
  elsif @m == 32
    @alpha = 0.697
  elsif @m == 64
    @alpha = 0.709
  else
    @alpha = 0.7213/(1 + 1.079/@m)
  end
end

Instance Method Details

#add(counter_name, value) ⇒ Object



21
22
23
24
25
26
27
28
# File 'lib/hyper_log_log.rb', line 21

def add(counter_name, value)
  hash = MurmurHash3::V32.murmur3_32_str_hash(value)
  function_name = hash % @m
  w = hash / @m
  existing_value = (@redis.hget(counter_name, function_name) || 0).to_i
  new_value = [existing_value, rho(w)].max
  @redis.hset(counter_name, function_name, new_value) if new_value > existing_value
end

#count(counter_name) ⇒ Object

Estimate the cardinality of a single set



31
32
33
# File 'lib/hyper_log_log.rb', line 31

def count(counter_name)
  union_helper([counter_name])
end

#intersection(*counter_names) ⇒ Object

Estimate the cardinality of the intersection of several sets. We do this by using the principle of inclusion and exclusion to represent the size of the intersection as the alternating sum of an exponential number of cardinalities of unions of smaller sets.



52
53
54
55
56
57
58
59
# File 'lib/hyper_log_log.rb', line 52

def intersection(*counter_names)
  icount = (1..counter_names.length).map do |k|
    counter_names.combination(k).map do |group|
      ((k % 2 == 0) ? -1 : 1) * union_helper(group)
    end.inject(0, :+)
  end.inject(0, :+)
  [icount, 0].max
end

#raw_union(counter_names) ⇒ Object



78
79
80
81
82
83
# File 'lib/hyper_log_log.rb', line 78

def raw_union(counter_names)
  counter_names.map{ |counter_name| @redis.hgetall(counter_name).map{ |x,y| [x, y.to_i] } }
               .reduce(:concat)
               .group_by{ |key, count| key }
               .map{ |key, counters| [key, counters.map{ |x| x.last }.max] }
end

#rho(i) ⇒ Object

rho(i) is the position of the first 1 in the binary representation of i, reading from most significant to least significant bits. Some examples: rho(1…) = 1, rho(001…) = 3, rho(000…0) = @bits_in_hash + 1



88
89
90
91
92
93
94
# File 'lib/hyper_log_log.rb', line 88

def rho(i)
  if i == 0
    @bits_in_hash + 1 
  else
    @bits_in_hash - Math.log(i, 2).floor
  end
end

#union(*counter_names) ⇒ Object

Estimate the cardinality of the union of several sets



36
37
38
# File 'lib/hyper_log_log.rb', line 36

def union(*counter_names)
  union_helper(counter_names)
end

#union_helper(counter_names) ⇒ Object



61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# File 'lib/hyper_log_log.rb', line 61

def union_helper(counter_names)
  all_estimates = raw_union(counter_names).map{ |value, score| 2 ** -score }
  estimate_sum = all_estimates.reduce(:+) || 0
  estimate = @alpha * @m * @m * ((estimate_sum + @m - all_estimates.length) ** -1)
  if estimate <= 2.5 * @m
    if all_estimates.length == @m
      estimate.round
    else # Correction for small sets
      (@m * Math.log(Float(@m)/(@m - all_estimates.length))).round
    end
  elsif estimate <= 2 ** 32 / 30.0
    estimate.round
  else # Correction for large sets
    (-2**32 * Math.log(1 - estimate/(2.0**32))).round
  end
end

#union_store(destination, *counter_names) ⇒ Object

Store the union of several sets in destination so that it can be used as a HyperLogLog counter later.



42
43
44
45
46
# File 'lib/hyper_log_log.rb', line 42

def union_store(destination, *counter_names)
  raw_union(counter_names).each do |key, count|
    @redis.hset(destination, key, count)
  end
end