Class: Frequent::Algorithm

Inherits:
Object
  • Object
show all
Defined in:
lib/frequent/algorithm.rb

Overview

Frequent::Algorithm is the Ruby implementation of the Demaine et al. FREQUENT algorithm for calculating top-k items in a stream.

The aims of this algorithm are:

  • use limited memory
  • require constant processing time per item
  • require a single-pass only

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(n, b, k = 1) ⇒ Algorithm

Initializes this top-k frequency-calculating instance.

Parameters:

  • n (Integer)

    number of items in the main window

  • b (Integer)

    number of items in a basic window

  • k (Integer) (defaults to: 1)

    number of top item categories to track

Raises:

  • (ArgumentError)

    if n is not greater than 0

  • (ArgumentError)

    if b is not greater than 0

  • (ArgumentError)

    if k is not greater than 0

  • (ArgumentError)

    if n/b is not greater than 1



41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# File 'lib/frequent/algorithm.rb', line 41

def initialize(n, b, k=1)
  if n <= 0
    raise ArgumentError.new('n must be greater than 0')
  end
  if b <= 0
    raise ArgumentError.new('b must be greater than 0')
  end
  if k <= 0
    raise ArgumentError.new('k must be greater than 0')
  end
  if n/b < 1
    raise ArgumentError.new('n/b must be greater than 1')
  end
  @n = n
  @b = b
  @k = k

  @queue = [] 
  @statistics = {}
  @delta = 0
end

Instance Attribute Details

#bInteger (readonly)

Returns the number of items in a basic window.

Returns:

  • (Integer)

    the number of items in a basic window



22
23
24
# File 'lib/frequent/algorithm.rb', line 22

def b
  @b
end

#deltaInteger (readonly)

Returns minimum threshold for membership in top-k items.

Returns:

  • (Integer)

    minimum threshold for membership in top-k items



30
31
32
# File 'lib/frequent/algorithm.rb', line 30

def delta
  @delta
end

#kInteger (readonly)

Returns the number of top item categories to track.

Returns:

  • (Integer)

    the number of top item categories to track



24
25
26
# File 'lib/frequent/algorithm.rb', line 24

def k
  @k
end

#nInteger (readonly)

Returns the number of items in the main window.

Returns:

  • (Integer)

    the number of items in the main window



20
21
22
# File 'lib/frequent/algorithm.rb', line 20

def n
  @n
end

#queueArray<Hash<Object,Integer>> (readonly)

Returns global queue for basic window summaries.

Returns:

  • (Array<Hash<Object,Integer>>)

    global queue for basic window summaries



26
27
28
# File 'lib/frequent/algorithm.rb', line 26

def queue
  @queue
end

#statisticsHash<Object,Integer> (readonly)

Returns global mapping of items and counts.

Returns:

  • (Hash<Object,Integer>)

    global mapping of items and counts



28
29
30
# File 'lib/frequent/algorithm.rb', line 28

def statistics
  @statistics
end

Instance Method Details

#process(elements) ⇒ Object

Processes a single basic window of b items, by first adding a summary of this basic window in the internal global queue; and then updating the global statistics accordingly.

Parameters:

  • an (Array)

    array of objects representing a basic window



68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# File 'lib/frequent/algorithm.rb', line 68

def process(elements)
  # Do we need this?

  return if elements.length != @b

  # Step 1

  summary = {}
  elements.each do |e|
    if summary.key? e
      summary[e] += 1
    else
      summary[e] = 1
    end
  end

  # Step 2

  @queue << summary

  # Step 3

  # Done, implicitly


  # Step 4

  summary.each do |k,v|
    if @statistics.key? k
      @statistics[k] += v
    else
      @statistics[k] = v
    end
  end

  # Step 5

  @delta += kth_largest(summary.values, @k)

  # Step 6 - sizeOf(Q) > N/b

  if @queue.length > @n/@b
    # a

    summary_p = @queue.shift
    @delta -= kth_largest(summary_p.values, @k)

    # b

    summary_p.each { |k,v| @statistics[k] -= v }
    @statistics.delete_if { |k,v| v <= 0 }

    #c

    @statistics.select { |k,v| v > @delta }
  else
    {}
  end
end

#quickselect(aset, k) ⇒ Object



139
140
141
142
143
144
145
146
147
148
149
150
151
152
# File 'lib/frequent/algorithm.rb', line 139

def quickselect(aset, k)
  p = rand(aset.size)

  lower = aset.select { |e| e < aset[p] }
  upper = aset.select { |e| e > aset[p] }

  if k <= lower.size
    quickselect(lower, k)
  elsif k > aset.size - upper.size
    quickselect(upper, k - (aset.size - upper.size))
  else
    aset[p]
  end
end

#versionString

Returns the version for this gem.

Returns:

  • (String)

    the version for this gem.



120
121
122
# File 'lib/frequent/algorithm.rb', line 120

def version
  Frequent::VERSION
end