Class: HyperLogLog::TimeSeriesCounter

Inherits:
Object
  • Object
show all
Includes:
Algorithm
Defined in:
lib/time_series_counter.rb

Instance Method Summary collapse

Methods included from Algorithm

#initialize, #intersection

Instance Method Details

#add(counter_name, value, time = nil) ⇒ Object

This is an implementation of HyperLogLog that allows for querying counts within time ranges of the form (t, current_time] with second-level granularity. The standard implementation of HyperLogLog stores the max number of leading zeros seen in the image of each of 2 ** b hash functions. These counts can naturally be stored in a string of length 2 ** b by allocating one byte per leading zero count.

To provide counts within a time range, we alter the standard implementation to store a mapping of pairs of the form (hash function, leading zero count) -> timestamp, where the mapping (h,z) -> t represents the fact that we observed z leading zeros in the image of hash function h most recently at time t. This mapping is stored in a string by packing 4-byte words (timestamps, represented in seconds since the epoch) into a matrix indexed by hash function and zero count stored in row-major order. Since the max zero count for a counter with parameter b is (32-b), this representation takes up at most 4 * (32-b) * (2 ** b) bytes (and usually much less, since we don’t allocate space for rows corresponding to higher leading zero counts until they’re actaully observed.)

To convert this representation to a HyperLogLog counter for the time range (t, current_time], we simply filter out all timestamps less than t in the matrix and then find, for each hash function, the maximum z for which that hash function has a non-zero timestamp.



29
30
31
32
33
34
35
36
37
38
39
# File 'lib/time_series_counter.rb', line 29

def add(counter_name, value, time=nil)
  hash, function_name, new_value = hash_info(value)
  index = 4 * (function_name + (new_value.to_i * @m))
  if time.nil?
    @redis.setrange(counter_name, index, [Time.now.to_i].pack('N'))
  else
    existing_time = @redis.getrange(counter_name, index, index + 3)
    existing_val = existing_time.empty? ? 0 : existing_time.unpack('N').first
    @redis.setrange(counter_name, index, [time.to_i].pack('N')) if time.to_i > existing_val 
  end
end

#count(counter_name, time = 0) ⇒ Object

Estimate the cardinality of a single set



42
43
44
# File 'lib/time_series_counter.rb', line 42

def count(counter_name, time=0)
  union_helper([counter_name], time)
end

#union(counter_names, time = 0) ⇒ Object

Estimate the cardinality of the union of several sets



47
48
49
# File 'lib/time_series_counter.rb', line 47

def union(counter_names, time=0)
  union_helper(counter_names, time)
end

#union_store(destination, counter_names, time = 0) ⇒ Object

Store the union of several sets in destination so that it can be used as a HyperLogLog counter later.



53
54
55
56
57
# File 'lib/time_series_counter.rb', line 53

def union_store(destination, counter_names, time=0)
  raw_counters = @redis.mget(*counter_names).compact.map{ |c| c.unpack('N*').map{ |x| x > time ? x : 0 } }
  combined_counters = jagged_transpose(raw_counters).map{ |x| x.max.to_i }
  @redis.set(destination, combined_counters.pack('N*'))
end