Class: Frequent::Algorithm

Inherits:
Object
  • Object
show all
Defined in:
lib/frequent/algorithm.rb

Overview

Frequent::Algorithm is the Ruby implementation of the Demaine et al. FREQUENT algorithm for calculating top-k items in a stream.

The aims of this algorithm are:

  • use limited memory
  • require constant processing time per item
  • require a single-pass only

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(n, b, k = 1) ⇒ Algorithm

Initializes this top-k frequency-calculating instance.

Parameters:

  • n (Integer)

    number of items in the main window

  • b (Integer)

    number of items in a basic window

  • k (Integer) (defaults to: 1)

    number of top item categories to track

Raises:

  • (ArgumentError)

    if n is not greater than 0

  • (ArgumentError)

    if b is not greater than 0

  • (ArgumentError)

    if k is not greater than 0

  • (ArgumentError)

    if n/b is not greater than 1



38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# File 'lib/frequent/algorithm.rb', line 38

def initialize(n, b, k=1)
  if n <= 0
    raise ArgumentError.new('n must be greater than 0')
  end
  if b <= 0
    raise ArgumentError.new('b must be greater than 0')
  end
  if k <= 0
    raise ArgumentError.new('k must be greater than 0')
  end
  if n/b < 1
    raise ArgumentError.new('n/b must be greater than 1')
  end
  @n = n
  @b = b
  @k = k

  @queue = [] 
  @statistics = {}
  @delta = 0
end

Instance Attribute Details

#bInteger (readonly)

Returns the number of items in a basic window.

Returns:

  • (Integer)

    the number of items in a basic window



19
20
21
# File 'lib/frequent/algorithm.rb', line 19

def b
  @b
end

#deltaInteger (readonly)

Returns minimum threshold for membership in top-k items.

Returns:

  • (Integer)

    minimum threshold for membership in top-k items



27
28
29
# File 'lib/frequent/algorithm.rb', line 27

def delta
  @delta
end

#kInteger (readonly)

Returns the number of top item categories to track.

Returns:

  • (Integer)

    the number of top item categories to track



21
22
23
# File 'lib/frequent/algorithm.rb', line 21

def k
  @k
end

#nInteger (readonly)

Returns the number of items in the main window.

Returns:

  • (Integer)

    the number of items in the main window



17
18
19
# File 'lib/frequent/algorithm.rb', line 17

def n
  @n
end

#queueArray<Hash<Object,Integer>> (readonly)

Returns global queue for basic window summaries.

Returns:

  • (Array<Hash<Object,Integer>>)

    global queue for basic window summaries



23
24
25
# File 'lib/frequent/algorithm.rb', line 23

def queue
  @queue
end

#statisticsHash<Object,Integer> (readonly)

Returns global mapping of items and counts.

Returns:

  • (Hash<Object,Integer>)

    global mapping of items and counts



25
26
27
# File 'lib/frequent/algorithm.rb', line 25

def statistics
  @statistics
end

Instance Method Details

#process(elements) ⇒ Object

Processes a single basic window of b items, by first adding a summary of this basic window in the internal global queue; and then updating the global statistics accordingly.

Parameters:

  • an (Array)

    array of objects representing a basic window



65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# File 'lib/frequent/algorithm.rb', line 65

def process(elements)
  # Do we need this?
  return if elements.length != @b

  # Step 1
  summary = {}
  elements.each do |e|
    if summary.key? e
      summary[e] += 1
    else
      summary[e] = 1
    end
  end

  # index of the k-th item
  kth_index = find_kth_largest(summary)

  # Step 2 & 3
  # summary is [[item,count],[item,count],[item,count]....]
  # sorted by descending order of the item count
  summary = summary.sort { |a,b| b[1]<=>a[1] }[0..kth_index]
  @queue << summary

  # Step 4
  summary.each do |t|
    if @statistics.key? t[0]
      @statistics[t[0]] += t[1]
    else
      @statistics[t[0]] = t[1]
    end
  end

  # Step 5
  @delta += summary[kth_index][1]
  
  # Step 6
  if should_pop_oldest_summary
    # a
    summary_p = @queue.shift
    @delta -= summary_p[find_kth_largest(summary_p)][1]

    # b
    summary_p.each { |t| @statistics[t[0]] -= t[1] }
    @statistics.delete_if { |k,v| v <= 0 }

    #c
    @statistics.select { |k,v| v > @delta }
  else
    {}
  end
end

#versionString

Returns the version for this gem.

Returns:

  • (String)

    the version for this gem.



120
121
122
# File 'lib/frequent/algorithm.rb', line 120

def version
  Frequent::VERSION
end