Class: Frequent::Algorithm

Inherits:
Object
  • Object
show all
Defined in:
lib/frequent/algorithm.rb

Overview

Frequent::Algorithm is the Ruby implementation of the Demaine et al. FREQUENT algorithm for calculating top-k items in a stream.

The aims of this algorithm are:

  • use limited memory
  • require constant processing time per item
  • require a single-pass only

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(n, b, k = 1) ⇒ Algorithm

Initializes this top-k frequency-calculating instance.

Parameters:

  • n (Integer)

    number of items in the main window

  • b (Integer)

    number of items in a basic window

  • k (Integer) (defaults to: 1)

    number of top item categories to track

Raises:

  • (ArgumentError)

    if n is not greater than 0

  • (ArgumentError)

    if b is not greater than 0

  • (ArgumentError)

    if k is not greater than 0

  • (ArgumentError)

    if n/b is not greater than 1



47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# File 'lib/frequent/algorithm.rb', line 47

def initialize(n, b, k=1)
  @lock = Mutex.new

  if n <= 0
    raise ArgumentError.new('n must be greater than 0')
  end
  if b <= 0
    raise ArgumentError.new('b must be greater than 0')
  end
  if k <= 0
    raise ArgumentError.new('k must be greater than 0')
  end
  if n/b < 1
    raise ArgumentError.new('n/b must be greater than 1')
  end
  @n = n
  @b = b
  @k = k

  @queue = [] 
  @statistics = {}
  @delta = 0
  @topk = {}
  @window = []
end

Instance Attribute Details

#bInteger (readonly)

Returns the number of items in a basic window.

Returns:

  • (Integer)

    the number of items in a basic window



24
25
26
# File 'lib/frequent/algorithm.rb', line 24

def b
  @b
end

#deltaInteger (readonly)

Returns minimum threshold for membership in top-k items.

Returns:

  • (Integer)

    minimum threshold for membership in top-k items



32
33
34
# File 'lib/frequent/algorithm.rb', line 32

def delta
  @delta
end

#kInteger (readonly)

Returns the number of top item categories to track.

Returns:

  • (Integer)

    the number of top item categories to track



26
27
28
# File 'lib/frequent/algorithm.rb', line 26

def k
  @k
end

#nInteger (readonly)

Returns the number of items in the main window.

Returns:

  • (Integer)

    the number of items in the main window



22
23
24
# File 'lib/frequent/algorithm.rb', line 22

def n
  @n
end

#queueArray<Hash<Object,Integer>> (readonly)

Returns global queue for basic window summaries.

Returns:

  • (Array<Hash<Object,Integer>>)

    global queue for basic window summaries



28
29
30
# File 'lib/frequent/algorithm.rb', line 28

def queue
  @queue
end

#statisticsHash<Object,Integer> (readonly)

Returns global mapping of items and counts.

Returns:

  • (Hash<Object,Integer>)

    global mapping of items and counts



30
31
32
# File 'lib/frequent/algorithm.rb', line 30

def statistics
  @statistics
end

#topkHash<Object,Integer> (readonly)

Returns latest top k elements and their counts.

Returns:

  • (Hash<Object,Integer>)

    latest top k elements and their counts



34
35
36
# File 'lib/frequent/algorithm.rb', line 34

def topk
  @topk
end

#windowArray[Object] (readonly)

Returns the window of elements of size b.

Returns:

  • (Array[Object])

    the window of elements of size b



36
37
38
# File 'lib/frequent/algorithm.rb', line 36

def window
  @window
end

Instance Method Details

#process(element) ⇒ Object

Processes a single basic window of b items, by first adding a summary of this basic window in the internal global queue; and then updating the global statistics accordingly.

Parameters:

  • an (Object)

    object from a data stream



78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
# File 'lib/frequent/algorithm.rb', line 78

def process(element)
  @lock.synchronize do
    @window << element
    if @window.length == @b 

      # Step 1
      summary = {}
      @window.each do |e|
        if summary.key? e
          summary[e] += 1
        else
          summary[e] = 1
        end
      end
      @window.clear   #current window cleared

      # Step 2
      @queue << summary

      # Step 3
      # Done, implicitly

      # Step 4
      summary.each do |k,v|
        if @statistics.key? k
          @statistics[k] += v
        else
          @statistics[k] = v
        end
      end

      # Step 5
      @delta += kth_largest(summary.values, @k)

      # Step 6 - sizeOf(Q) > N/b
      if @queue.length > @n/@b
        # a
        summary_p = @queue.shift
        @delta -= kth_largest(summary_p.values, @k)

        # b
        summary_p.each { |k,v| @statistics[k] -= v }
        @statistics.delete_if { |k,v| v <= 0 }

        #c
        @topk = @statistics.select { |k,v| v > @delta }
      end
    end
  end
end

#quickselect(aset, k) ⇒ Object



158
159
160
161
162
163
164
165
166
167
168
169
170
171
# File 'lib/frequent/algorithm.rb', line 158

def quickselect(aset, k)
  p = rand(aset.size)

  lower = aset.select { |e| e < aset[p] }
  upper = aset.select { |e| e > aset[p] }

  if k <= lower.size
    quickselect(lower, k)
  elsif k > aset.size - upper.size
    quickselect(upper, k - (aset.size - upper.size))
  else
    aset[p]
  end
end

#reportHash<Object,Integer>

Return the latest Tok K elements

Returns:

  • (Hash<Object,Integer>)

    a hash which contains the current top K elements and their counts



132
133
134
# File 'lib/frequent/algorithm.rb', line 132

def report
  @topk
end

#versionString

Returns the version for this gem.

Returns:

  • (String)

    the version for this gem.



139
140
141
# File 'lib/frequent/algorithm.rb', line 139

def version
  Frequent::VERSION
end