Class: Frequent::Algorithm
- Inherits:
-
Object
- Object
- Frequent::Algorithm
- Defined in:
- lib/frequent/algorithm.rb
Overview
Frequent::Algorithm is the Ruby implementation of the
Demaine et al. FREQUENT algorithm for calculating
top-k items in a stream.
The aims of this algorithm are:
- use limited memory
- require constant processing time per item
- require a single-pass only
Instance Attribute Summary collapse
-
#b ⇒ Integer
readonly
The number of items in a basic window.
-
#delta ⇒ Integer
readonly
Minimum threshold for membership in top-k items.
-
#k ⇒ Integer
readonly
The number of top item categories to track.
-
#n ⇒ Integer
readonly
The number of items in the main window.
-
#queue ⇒ Array<Hash<Object,Integer>>
readonly
Global queue for basic window summaries.
-
#statistics ⇒ Hash<Object,Integer>
readonly
Global mapping of items and counts.
-
#topk ⇒ Hash<Object,Integer>
readonly
Latest top k elements and their counts.
-
#window ⇒ Array[Object]
readonly
The window of elements of size b.
Instance Method Summary collapse
-
#initialize(n, b, k = 1) ⇒ Algorithm
constructor
Initializes this top-k frequency-calculating instance.
-
#process(element) ⇒ Object
Processes a single basic window of b items, by first adding a summary of this basic window in the internal global queue; and then updating the global statistics accordingly.
- #quickselect(aset, k) ⇒ Object
-
#report ⇒ Hash<Object,Integer>
Return the latest Tok K elements.
-
#version ⇒ String
Returns the version for this gem.
Constructor Details
#initialize(n, b, k = 1) ⇒ Algorithm
Initializes this top-k frequency-calculating instance.
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
# File 'lib/frequent/algorithm.rb', line 47 def initialize(n, b, k=1) @lock = Mutex.new if n <= 0 raise ArgumentError.new('n must be greater than 0') end if b <= 0 raise ArgumentError.new('b must be greater than 0') end if k <= 0 raise ArgumentError.new('k must be greater than 0') end if n/b < 1 raise ArgumentError.new('n/b must be greater than 1') end @n = n @b = b @k = k @queue = [] @statistics = {} @delta = 0 @topk = {} @window = [] end |
Instance Attribute Details
#b ⇒ Integer (readonly)
Returns the number of items in a basic window.
24 25 26 |
# File 'lib/frequent/algorithm.rb', line 24 def b @b end |
#delta ⇒ Integer (readonly)
Returns minimum threshold for membership in top-k items.
32 33 34 |
# File 'lib/frequent/algorithm.rb', line 32 def delta @delta end |
#k ⇒ Integer (readonly)
Returns the number of top item categories to track.
26 27 28 |
# File 'lib/frequent/algorithm.rb', line 26 def k @k end |
#n ⇒ Integer (readonly)
Returns the number of items in the main window.
22 23 24 |
# File 'lib/frequent/algorithm.rb', line 22 def n @n end |
#queue ⇒ Array<Hash<Object,Integer>> (readonly)
Returns global queue for basic window summaries.
28 29 30 |
# File 'lib/frequent/algorithm.rb', line 28 def queue @queue end |
#statistics ⇒ Hash<Object,Integer> (readonly)
Returns global mapping of items and counts.
30 31 32 |
# File 'lib/frequent/algorithm.rb', line 30 def statistics @statistics end |
#topk ⇒ Hash<Object,Integer> (readonly)
Returns latest top k elements and their counts.
34 35 36 |
# File 'lib/frequent/algorithm.rb', line 34 def topk @topk end |
#window ⇒ Array[Object] (readonly)
Returns the window of elements of size b.
36 37 38 |
# File 'lib/frequent/algorithm.rb', line 36 def window @window end |
Instance Method Details
#process(element) ⇒ Object
Processes a single basic window of b items, by first adding a summary of this basic window in the internal global queue; and then updating the global statistics accordingly.
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
# File 'lib/frequent/algorithm.rb', line 78 def process(element) @lock.synchronize do @window << element if @window.length == @b # Step 1 summary = {} @window.each do |e| if summary.key? e summary[e] += 1 else summary[e] = 1 end end @window.clear #current window cleared # Step 2 @queue << summary # Step 3 # Done, implicitly # Step 4 summary.each do |k,v| if @statistics.key? k @statistics[k] += v else @statistics[k] = v end end # Step 5 @delta += kth_largest(summary.values, @k) # Step 6 - sizeOf(Q) > N/b if @queue.length > @n/@b # a summary_p = @queue.shift @delta -= kth_largest(summary_p.values, @k) # b summary_p.each { |k,v| @statistics[k] -= v } @statistics.delete_if { |k,v| v <= 0 } #c @topk = @statistics.select { |k,v| v > @delta } end end end end |
#quickselect(aset, k) ⇒ Object
158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
# File 'lib/frequent/algorithm.rb', line 158 def quickselect(aset, k) p = rand(aset.size) lower = aset.select { |e| e < aset[p] } upper = aset.select { |e| e > aset[p] } if k <= lower.size quickselect(lower, k) elsif k > aset.size - upper.size quickselect(upper, k - (aset.size - upper.size)) else aset[p] end end |
#report ⇒ Hash<Object,Integer>
Return the latest Tok K elements
132 133 134 |
# File 'lib/frequent/algorithm.rb', line 132 def report @topk end |