Class: Frequent::Algorithm
- Inherits:
-
Object
- Object
- Frequent::Algorithm
- Defined in:
- lib/frequent/algorithm.rb
Overview
Frequent::Algorithm is the Ruby implementation of the
Demaine et al. FREQUENT algorithm for calculating
top-k items in a stream.
The aims of this algorithm are:
- use limited memory
- require constant processing time per item
- require a single-pass only
Instance Attribute Summary collapse
-
#b ⇒ Integer
readonly
The number of items in a basic window.
-
#delta ⇒ Integer
readonly
Minimum threshold for membership in top-k items.
-
#k ⇒ Integer
readonly
The number of top item categories to track.
-
#n ⇒ Integer
readonly
The number of items in the main window.
-
#queue ⇒ Array<Hash<Object,Integer>>
readonly
Global queue for basic window summaries.
-
#statistics ⇒ Hash<Object,Integer>
readonly
Global mapping of items and counts.
Instance Method Summary collapse
-
#initialize(n, b, k = 1) ⇒ Algorithm
constructor
Initializes this top-k frequency-calculating instance.
-
#process(elements) ⇒ Object
Processes a single basic window of b items, by first adding a summary of this basic window in the internal global queue; and then updating the global statistics accordingly.
-
#version ⇒ String
Returns the version for this gem.
Constructor Details
#initialize(n, b, k = 1) ⇒ Algorithm
Initializes this top-k frequency-calculating instance.
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
# File 'lib/frequent/algorithm.rb', line 38 def initialize(n, b, k=1) if n <= 0 raise ArgumentError.new('n must be greater than 0') end if b <= 0 raise ArgumentError.new('b must be greater than 0') end if k <= 0 raise ArgumentError.new('k must be greater than 0') end if n/b < 1 raise ArgumentError.new('n/b must be greater than 1') end @n = n @b = b @k = k @queue = [] @statistics = {} @delta = 0 end |
Instance Attribute Details
#b ⇒ Integer (readonly)
Returns the number of items in a basic window.
19 20 21 |
# File 'lib/frequent/algorithm.rb', line 19 def b @b end |
#delta ⇒ Integer (readonly)
Returns minimum threshold for membership in top-k items.
27 28 29 |
# File 'lib/frequent/algorithm.rb', line 27 def delta @delta end |
#k ⇒ Integer (readonly)
Returns the number of top item categories to track.
21 22 23 |
# File 'lib/frequent/algorithm.rb', line 21 def k @k end |
#n ⇒ Integer (readonly)
Returns the number of items in the main window.
17 18 19 |
# File 'lib/frequent/algorithm.rb', line 17 def n @n end |
#queue ⇒ Array<Hash<Object,Integer>> (readonly)
Returns global queue for basic window summaries.
23 24 25 |
# File 'lib/frequent/algorithm.rb', line 23 def queue @queue end |
#statistics ⇒ Hash<Object,Integer> (readonly)
Returns global mapping of items and counts.
25 26 27 |
# File 'lib/frequent/algorithm.rb', line 25 def statistics @statistics end |
Instance Method Details
#process(elements) ⇒ Object
Processes a single basic window of b items, by first adding a summary of this basic window in the internal global queue; and then updating the global statistics accordingly.
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
# File 'lib/frequent/algorithm.rb', line 65 def process(elements) # Do we need this? return if elements.length != @b # Step 1 summary = {} elements.each do |e| if summary.key? e summary[e] += 1 else summary[e] = 1 end end # index of the k-th item kth_index = find_kth_largest(summary) # Step 2 & 3 # summary is [[item,count],[item,count],[item,count]....] # sorted by descending order of the item count summary = summary.sort { |a,b| b[1]<=>a[1] }[0..kth_index] @queue << summary # Step 4 summary.each do |t| if @statistics.key? t[0] @statistics[t[0]] += t[1] else @statistics[t[0]] = t[1] end end # Step 5 @delta += summary[kth_index][1] # Step 6 if should_pop_oldest_summary # a summary_p = @queue.shift @delta -= summary_p[find_kth_largest(summary_p)][1] # b summary_p.each { |t| @statistics[t[0]] -= t[1] } @statistics.delete_if { |k,v| v <= 0 } #c @statistics.select { |k,v| v > @delta } else {} end end |