Class: Airbrake::TDigest Private

Inherits:
Object
  • Object
show all
Defined in:
lib/airbrake-ruby/tdigest.rb

Overview

This class is part of a private API. You should avoid using this class if possible, as it may be removed or be changed in the future.

Ruby implementation of Ted Dunning’s t-digest data structure.

This implementation is imported from github.com/castle/tdigest with custom modifications. Huge thanks to Castle for the implementation :beer:

The difference is that we pack with Big Endian (unlike Native Endian in Castle’s version). Our backend does not permit little endian.

rubocop:disable Metrics/ClassLength

Defined Under Namespace

Classes: Centroid

Constant Summary collapse

VERBOSE_ENCODING =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Since:

  • v3.2.0

1
SMALL_ENCODING =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Since:

  • v3.2.0

2

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(delta = 0.01, k = 25, cx = 1.1) ⇒ TDigest

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Returns a new instance of TDigest.

Since:

  • v3.2.0



43
44
45
46
47
48
49
50
# File 'lib/airbrake-ruby/tdigest.rb', line 43

def initialize(delta = 0.01, k = 25, cx = 1.1)
  @delta = delta
  @k = k
  @cx = cx
  @centroids = RBTree.new
  @size = 0
  @last_cumulate = 0
end

Instance Attribute Details

#centroidsObject

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Since:

  • v3.2.0



40
41
42
# File 'lib/airbrake-ruby/tdigest.rb', line 40

def centroids
  @centroids
end

#sizeObject (readonly)

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Since:

  • v3.2.0



41
42
43
# File 'lib/airbrake-ruby/tdigest.rb', line 41

def size
  @size
end

Class Method Details

.from_bytes(bytes) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

rubocop:disable Metrics/PerceivedComplexity, Metrics/MethodLength rubocop:disable Metrics/CyclomaticComplexity, Metrics/AbcSize

Since:

  • v3.2.0



249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
# File 'lib/airbrake-ruby/tdigest.rb', line 249

def self.from_bytes(bytes)
  format, compression, size = bytes.unpack('NGN')
  tdigest = new(1 / compression)

  start_idx = 16 # after header
  case format
  when VERBOSE_ENCODING
    array = bytes[start_idx..-1].unpack("G#{size}N#{size}")
    means, counts = array.each_slice(size).to_a if array.any?
  when SMALL_ENCODING
    means = bytes[start_idx..(start_idx + (4 * size))].unpack("g#{size}")
    # Decode delta encoding of means
    x = 0
    means.map! do |m|
      m += x
      x = m
      m
    end
    counts_bytes = bytes[(start_idx + (4 * size))..-1].unpack('C*')
    counts = []
    # Decode variable length integer bytes
    size.times do
      v = counts_bytes.shift
      z = 0x7f & v
      shift = 7
      while (v & 0x80) != 0
        raise 'Shift too large in decode' if shift > 28

        v = counts_bytes.shift || 0
        z += (v & 0x7f) << shift
        shift += 7
      end
      counts << z
    end
    # This shouldn't happen
    raise 'Mismatch' unless counts.size == means.size
  else
    raise 'Unknown compression format'
  end

  means.zip(counts).each { |val| tdigest.push(val[0], val[1]) } if means && counts

  tdigest
end

.from_json(array) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

rubocop:enable Metrics/PerceivedComplexity, Metrics/MethodLength rubocop:enable Metrics/CyclomaticComplexity, Metrics/AbcSize

Since:

  • v3.2.0



296
297
298
299
300
301
# File 'lib/airbrake-ruby/tdigest.rb', line 296

def self.from_json(array)
  tdigest = new
  # Handle both string and symbol keys
  array.each { |a| tdigest.push(a['m'] || a[:m], a['n'] || a[:n]) }
  tdigest
end

Instance Method Details

#+(other) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Since:

  • v3.2.0



52
53
54
55
56
57
58
# File 'lib/airbrake-ruby/tdigest.rb', line 52

def +(other)
  # Uses delta, k and cx from the caller
  t = self.class.new(@delta, @k, @cx)
  data = centroids.values + other.centroids.values
  t.push_centroid(data.delete_at(rand(data.length))) while data.any?
  t
end

#as_bytesObject

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Since:

  • v3.2.0



60
61
62
63
64
65
66
67
# File 'lib/airbrake-ruby/tdigest.rb', line 60

def as_bytes
  # compression as defined by Java implementation
  size = @centroids.size
  output = [VERBOSE_ENCODING, compression, size]
  output += @centroids.each_value.map(&:mean)
  output += @centroids.each_value.map(&:n)
  output.pack("NGNG#{size}N#{size}")
end

#as_json(_ = nil) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

rubocop:enable Metrics/AbcSize

Since:

  • v3.2.0



99
100
101
# File 'lib/airbrake-ruby/tdigest.rb', line 99

def as_json(_ = nil)
  @centroids.each_value.map(&:as_json)
end

#as_small_bytesObject

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

rubocop:disable Metrics/AbcSize

Since:

  • v3.2.0



70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
# File 'lib/airbrake-ruby/tdigest.rb', line 70

def as_small_bytes
  size = @centroids.size
  output = [self.class::SMALL_ENCODING, compression, size]
  x = 0
  # delta encoding allows saving 4-bytes floats
  mean_arr = @centroids.each_value.map do |c|
    val = c.mean - x
    x = c.mean
    val
  end
  output += mean_arr
  # Variable length encoding of numbers
  c_arr = @centroids.each_value.with_object([]) do |c, arr|
    k = 0
    n = c.n
    while n < 0 || n > 0x7f
      b = 0x80 | (0x7f & n)
      arr << b
      n = n >> 7
      k += 1
      raise 'Unreasonable large number' if k > 6
    end
    arr << n
  end
  output += c_arr
  output.pack("NGNg#{size}C#{size}")
end

#bound_mean(x) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Since:

  • v3.2.0



103
104
105
106
107
# File 'lib/airbrake-ruby/tdigest.rb', line 103

def bound_mean(x)
  upper = @centroids.upper_bound(x)
  lower = @centroids.lower_bound(x)
  [lower[1], upper[1]]
end

#bound_mean_cumn(cumn) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Since:

  • v3.2.0



109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# File 'lib/airbrake-ruby/tdigest.rb', line 109

def bound_mean_cumn(cumn)
  last_c = nil
  bounds = []
  @centroids.each_value do |v|
    if v.mean_cumn == cumn
      bounds << v
      break
    elsif v.mean_cumn > cumn
      bounds << last_c
      bounds << v
      break
    else
      last_c = v
    end
  end
  # If still no results, pick lagging value if any
  bounds << last_c if bounds.empty? && !last_c.nil?

  bounds
end

#compress!Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Since:

  • v3.2.0



130
131
132
133
134
135
136
# File 'lib/airbrake-ruby/tdigest.rb', line 130

def compress!
  points = to_a
  reset!
  push_centroid(points.shuffle)
  _cumulate(exact: true, force: true)
  nil
end

#compressionObject

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Since:

  • v3.2.0



138
139
140
# File 'lib/airbrake-ruby/tdigest.rb', line 138

def compression
  1 / @delta
end

#find_nearest(x) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Since:

  • v3.2.0



142
143
144
145
146
147
148
149
150
151
152
153
154
155
# File 'lib/airbrake-ruby/tdigest.rb', line 142

def find_nearest(x)
  return if size == 0

  upper_key, upper = @centroids.upper_bound(x)
  lower_key, lower = @centroids.lower_bound(x)
  return lower unless upper_key
  return upper unless lower_key

  if (lower_key - x).abs < (upper_key - x).abs
    lower
  else
    upper
  end
end

#merge!(other) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Since:

  • v3.2.0



157
158
159
160
# File 'lib/airbrake-ruby/tdigest.rb', line 157

def merge!(other)
  push_centroid(other.centroids.values.shuffle)
  self
end

#p_rank(x) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

rubocop:disable Metrics/PerceivedComplexity, Metrics/AbcSize rubocop:disable Metrics/CyclomaticComplexity

Since:

  • v3.2.0



164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
# File 'lib/airbrake-ruby/tdigest.rb', line 164

def p_rank(x)
  is_array = x.is_a? Array
  x = [x] unless is_array

  min = @centroids.first
  max = @centroids.last

  x.map! do |item|
    if size == 0
      nil
    elsif item < min[1].mean
      0.0
    elsif item > max[1].mean
      1.0
    else
      _cumulate(exact: true)
      bound = bound_mean(item)
      lower, upper = bound
      mean_cumn = lower.mean_cumn
      if lower != upper
        mean_cumn += (item - lower.mean) * (upper.mean_cumn - lower.mean_cumn) \
          / (upper.mean - lower.mean)
      end
      mean_cumn / @size
    end
  end
  is_array ? x : x.first
end

#percentile(p) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

rubocop:disable Metrics/PerceivedComplexity, Metrics/CyclomaticComplexity rubocop:disable Metrics/AbcSize

Since:

  • v3.2.0



197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
# File 'lib/airbrake-ruby/tdigest.rb', line 197

def percentile(p)
  is_array = p.is_a? Array
  p = [p] unless is_array
  p.map! do |item|
    unless (0..1).cover?(item)
      raise ArgumentError, "p should be in [0,1], got #{item}"
    end

    if size == 0
      nil
    else
      _cumulate(exact: true)
      h = @size * item
      lower, upper = bound_mean_cumn(h)
      if lower.nil? && upper.nil?
        nil
      elsif upper == lower || lower.nil? || upper.nil?
        (lower || upper).mean
      elsif h == lower.mean_cumn
        lower.mean
      else
        upper.mean
      end
    end
  end
  is_array ? p : p.first
end

#push(x, n = 1) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

rubocop:enable Metrics/PerceivedComplexity, Metrics/CyclomaticComplexity rubocop:enable Metrics/AbcSize

Since:

  • v3.2.0



227
228
229
230
# File 'lib/airbrake-ruby/tdigest.rb', line 227

def push(x, n = 1)
  x = [x] unless x.is_a? Array
  x.each { |value| _digest(value, n) }
end

#push_centroid(c) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Since:

  • v3.2.0



232
233
234
235
# File 'lib/airbrake-ruby/tdigest.rb', line 232

def push_centroid(c)
  c = [c] unless c.is_a? Array
  c.each { |centroid| _digest(centroid.mean, centroid.n) }
end

#reset!Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Since:

  • v3.2.0



237
238
239
240
241
# File 'lib/airbrake-ruby/tdigest.rb', line 237

def reset!
  @centroids.clear
  @size = 0
  @last_cumulate = 0
end

#to_aObject

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Since:

  • v3.2.0



243
244
245
# File 'lib/airbrake-ruby/tdigest.rb', line 243

def to_a
  @centroids.each_value.to_a
end