Class: Quantile::Estimator

Inherits:
Object
  • Object
show all
Defined in:
lib/quantile/estimator.rb

Overview

Note:

Estimator is not concurrency safe.

Estimate quantile values efficiently where both the rank and the inaccuracy allowance are known a priori. This is accomplished via Graham Cormode and S. Muthukrishnan’s Effective Computation of Biased Quantiles over Data Streams in ICDE’05.

Defined Under Namespace

Classes: Sample

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(*invariants) ⇒ Estimator

Create a streaming quantile estimator.

Parameters:

  • invariants (Quantile)

    The quantile estimation targets that are provided a priori.



34
35
36
37
38
39
40
41
42
43
44
# File 'lib/quantile/estimator.rb', line 34

def initialize(*invariants)
  if invariants.empty?
    invariants = [Quantile.new(0.5, 0.05), Quantile.new(0.90, 0.01), Quantile.new(0.99, 0.001)]
  end

  @invariants = invariants
  @buffer = []
  @head = nil

  @observations, @sum = 0, 0
end

Instance Attribute Details

#invariantsObject

Get the quantile targets.



49
50
51
# File 'lib/quantile/estimator.rb', line 49

def invariants
  @invariants
end

Instance Method Details

#observationsObject

Get the number of observed values.



54
55
56
57
# File 'lib/quantile/estimator.rb', line 54

def observations
  flush
  @observations
end

#observe(value) ⇒ Object

Observe a sample value with this Quantile::Estimator.

Parameters:

  • value (Numeric)

    The value to catalog for later analysis.



64
65
66
67
68
69
70
71
# File 'lib/quantile/estimator.rb', line 64

def observe(value)
  @buffer << value
  if @buffer.size == BUFFER_SIZE
    flush
  end
  @observations += 1
  @sum += value
end

#query(rank) ⇒ Numeric?

Get a quantile value for a given rank.

Parameters:

  • rank (Float)

    The target quantile to retrieve. It must be one of the invariants provided in the constructor.

Returns:

  • (Numeric, nil)

    The quantile value for the rank or nil if no observations are present.



89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
# File 'lib/quantile/estimator.rb', line 89

def query(rank)
  flush

  current = @head
  return unless current

  mid_rank = (rank * @observations).floor
  max_rank = mid_rank + (invariant(mid_rank, @observations) / 2).floor

  rank = 0.0
  while current.successor
    rank += current.rank
    if rank + current.successor.rank + current.successor.delta > max_rank
      return current.value
    end

    current = current.successor
  end

  return current.value
end

#sumObject

Returns the sum of all observed values.



76
77
78
# File 'lib/quantile/estimator.rb', line 76

def sum
  @sum
end