Class: Frequent::Algorithm
- Inherits:
-
Object
- Object
- Frequent::Algorithm
- Defined in:
- lib/frequent/algorithm.rb
Overview
Frequent::Algorithm is the Ruby implementation of the
Demaine et al. FREQUENT algorithm for calculating
top-k items in a stream.
The aims of this algorithm are:
- use limited memory
- require constant processing time per item
- require a single-pass only
Instance Attribute Summary collapse
-
#b ⇒ Integer
readonly
The number of items in a basic window.
-
#delta ⇒ Integer
readonly
Minimum threshold for membership in top-k items.
-
#k ⇒ Integer
readonly
The number of top item categories to track.
-
#n ⇒ Integer
readonly
The number of items in the main window.
-
#queue ⇒ Array<Hash<Object,Integer>>
readonly
Global queue for basic window summaries.
-
#statistics ⇒ Hash<Object,Integer>
readonly
Global mapping of items and counts.
Instance Method Summary collapse
-
#initialize(n, b, k = 1) ⇒ Algorithm
constructor
Initializes this top-k frequency-calculating instance.
-
#process(elements) ⇒ Object
Processes a single basic window of b items, by first adding a summary of this basic window in the internal global queue; and then updating the global statistics accordingly.
- #quickselect(aset, k) ⇒ Object
-
#version ⇒ String
Returns the version for this gem.
Constructor Details
#initialize(n, b, k = 1) ⇒ Algorithm
Initializes this top-k frequency-calculating instance.
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
# File 'lib/frequent/algorithm.rb', line 41 def initialize(n, b, k=1) if n <= 0 raise ArgumentError.new('n must be greater than 0') end if b <= 0 raise ArgumentError.new('b must be greater than 0') end if k <= 0 raise ArgumentError.new('k must be greater than 0') end if n/b < 1 raise ArgumentError.new('n/b must be greater than 1') end @n = n @b = b @k = k @queue = [] @statistics = {} @delta = 0 end |
Instance Attribute Details
#b ⇒ Integer (readonly)
Returns the number of items in a basic window.
22 23 24 |
# File 'lib/frequent/algorithm.rb', line 22 def b @b end |
#delta ⇒ Integer (readonly)
Returns minimum threshold for membership in top-k items.
30 31 32 |
# File 'lib/frequent/algorithm.rb', line 30 def delta @delta end |
#k ⇒ Integer (readonly)
Returns the number of top item categories to track.
24 25 26 |
# File 'lib/frequent/algorithm.rb', line 24 def k @k end |
#n ⇒ Integer (readonly)
Returns the number of items in the main window.
20 21 22 |
# File 'lib/frequent/algorithm.rb', line 20 def n @n end |
#queue ⇒ Array<Hash<Object,Integer>> (readonly)
Returns global queue for basic window summaries.
26 27 28 |
# File 'lib/frequent/algorithm.rb', line 26 def queue @queue end |
#statistics ⇒ Hash<Object,Integer> (readonly)
Returns global mapping of items and counts.
28 29 30 |
# File 'lib/frequent/algorithm.rb', line 28 def statistics @statistics end |
Instance Method Details
#process(elements) ⇒ Object
Processes a single basic window of b items, by first adding a summary of this basic window in the internal global queue; and then updating the global statistics accordingly.
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
# File 'lib/frequent/algorithm.rb', line 68 def process(elements) # Do we need this? return if elements.length != @b # Step 1 summary = {} elements.each do |e| if summary.key? e summary[e] += 1 else summary[e] = 1 end end # Step 2 @queue << summary # Step 3 # Done, implicitly # Step 4 summary.each do |k,v| if @statistics.key? k @statistics[k] += v else @statistics[k] = v end end # Step 5 @delta += kth_largest(summary.values, @k) # Step 6 - sizeOf(Q) > N/b if @queue.length > @n/@b # a summary_p = @queue.shift @delta -= kth_largest(summary_p.values, @k) # b summary_p.each { |k,v| @statistics[k] -= v } @statistics.delete_if { |k,v| v <= 0 } #c @statistics.select { |k,v| v > @delta } else {} end end |
#quickselect(aset, k) ⇒ Object
139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
# File 'lib/frequent/algorithm.rb', line 139 def quickselect(aset, k) p = rand(aset.size) lower = aset.select { |e| e < aset[p] } upper = aset.select { |e| e > aset[p] } if k <= lower.size quickselect(lower, k) elsif k > aset.size - upper.size quickselect(upper, k - (aset.size - upper.size)) else aset[p] end end |
#version ⇒ String
Returns the version for this gem.
120 121 122 |
# File 'lib/frequent/algorithm.rb', line 120 def version Frequent::VERSION end |