Class: Hetchy::Reservoir

Inherits:
Object
  • Object
show all
Defined in:
lib/hetchy/reservoir.rb

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(opts = {}) ⇒ Reservoir

Create a reservoir.

Parameters:

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :size (Integer)

    Size of reservoir



11
12
13
14
15
# File 'lib/hetchy/reservoir.rb', line 11

def initialize(opts={})
  @size = opts.fetch(:size, 1000)
  @lock = Mutex.new
  initialize_pool
end

Instance Attribute Details

#countObject (readonly)

number of samples processed



4
5
6
# File 'lib/hetchy/reservoir.rb', line 4

def count
  @count
end

#poolObject (readonly)

number of samples processed



4
5
6
# File 'lib/hetchy/reservoir.rb', line 4

def pool
  @pool
end

#sizeObject (readonly)

number of samples processed



4
5
6
# File 'lib/hetchy/reservoir.rb', line 4

def size
  @size
end

Instance Method Details

#<<(values) ⇒ Object

Add one or more values to the reservoir.

Examples:

reservoir << 1234
reservoir << [2345,7891,2131]


22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# File 'lib/hetchy/reservoir.rb', line 22

def << (values)
  Array(values).each do |value|
    @lock.synchronize do
      # sampling strategy is Vitter's algo R
      if count < size
        @pool[count] = value
      else
        index = rand(count+1)
        if index < @size
          @pool[index] = value
        end
      end
      @count += 1
    end
  end
end

#clearObject

Empty/reset the reservoir



40
41
42
# File 'lib/hetchy/reservoir.rb', line 40

def clear
  initialize_pool
end

#percentile(perc) ⇒ Object

Calculate a percentile based on the current state of the reservoir.

If you are going to calculate multiple percentiles it will be faster to #snapshot and then calculate them off of the generated Dataset.



50
51
52
# File 'lib/hetchy/reservoir.rb', line 50

def percentile(perc)
  snapshot.percentile(perc)
end

#snapshotObject

Capture a moment in time for the reservoir for analysis. Since sampling may be ongoing this ensures we are working with data from our intended period.



58
59
60
61
62
# File 'lib/hetchy/reservoir.rb', line 58

def snapshot
  data = nil
  @lock.synchronize { data = @pool.dup }
  Dataset.new(data.compact)
end