Class: Bullshit::Analysis

Inherits:
Object
  • Object
show all
Defined in:
lib/bullshit.rb

Overview

This class is used to analyse the time measurements and compute their statistics.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(measurements) ⇒ Analysis

Returns a new instance of Analysis.


1047
1048
1049
1050
# File 'lib/bullshit.rb', line 1047

def initialize(measurements)
  @measurements = measurements
  @measurements.freeze
end

Instance Attribute Details

#measurementsObject (readonly)

Returns the array of measurements.


1053
1054
1055
# File 'lib/bullshit.rb', line 1053

def measurements
  @measurements
end

Instance Method Details

#arithmetic_meanObject Also known as: mean

Returns the arithmetic mean of the measurements.


1104
1105
1106
# File 'lib/bullshit.rb', line 1104

def arithmetic_mean
  @arithmetic_mean ||= sum / size
end

#autocorrelationObject

Returns the array of autocorrelation values c_k / c_0 (of length size - 1).


1276
1277
1278
1279
# File 'lib/bullshit.rb', line 1276

def autocorrelation
  c = autovariance
  Array.new(c.size) { |k| c[k] / c[0] }
end

#autovarianceObject

Returns the array of autovariances (of length size - 1).


1264
1265
1266
1267
1268
1269
1270
1271
1272
# File 'lib/bullshit.rb', line 1264

def autovariance
  Array.new(size - 1) do |k|
    s = 0.0
    0.upto(size - k - 1) do |i|
      s += (@measurements[i] - arithmetic_mean) * (@measurements[i + k] - arithmetic_mean)
    end
    s / size
  end
end

#common_standard_deviation(other) ⇒ Object

Returns an estimation of the common standard deviation of the measurements of this and other.


1206
1207
1208
# File 'lib/bullshit.rb', line 1206

def common_standard_deviation(other)
  Math.sqrt(common_variance(other))
end

#common_variance(other) ⇒ Object

Returns an estimation of the common variance of the measurements of this and other.


1212
1213
1214
1215
# File 'lib/bullshit.rb', line 1212

def common_variance(other)
  (size - 1) * sample_variance + (other.size - 1) * other.sample_variance /
    (size + other.size - 2)
end

#compute_student_df(other) ⇒ Object

Compute the # degrees of freedom for Student's t-test.


1218
1219
1220
# File 'lib/bullshit.rb', line 1218

def compute_student_df(other)
  size + other.size - 2
end

#compute_welch_df(other) ⇒ Object

Use an approximation of the Welch-Satterthwaite equation to compute the degrees of freedom for Welch's t-test.


1187
1188
1189
1190
1191
# File 'lib/bullshit.rb', line 1187

def compute_welch_df(other)
  (sample_variance / size + other.sample_variance / other.size) ** 2 / (
    (sample_variance ** 2 / (size ** 2 * (size - 1))) +
    (other.sample_variance ** 2 / (other.size ** 2 * (other.size - 1))))
end

#confidence_interval(alpha = 0.05) ⇒ Object

Return the confidence interval for the arithmetic mean with alpha level alpha of the measurements of this Analysis instance as a Range object.


1256
1257
1258
1259
1260
1261
# File 'lib/bullshit.rb', line 1256

def confidence_interval(alpha = 0.05)
  td = TDistribution.new(size - 1)
  t = td.inverse_probability(alpha / 2).abs
  delta = t * sample_standard_deviation / Math.sqrt(size)
  (arithmetic_mean - delta)..(arithmetic_mean + delta)
end

#cover?(other, alpha = 0.05) ⇒ Boolean

Return true, if the Analysis instance covers the other, that is their arithmetic mean value is most likely to be equal for the alpha error level.

Returns:

  • (Boolean)

1248
1249
1250
1251
1252
# File 'lib/bullshit.rb', line 1248

def cover?(other, alpha = 0.05)
  t = t_welch(other)
  td = TDistribution.new(compute_welch_df(other))
  t.abs < td.inverse_probability(1 - alpha.abs / 2.0)
end

#detect_autocorrelation(lags = 20, alpha_level = 0.05) ⇒ Object

This method tries to detect autocorrelation with the Ljung-Box statistic. If enough lags can be considered it returns a hash with results, otherwise nil is returned. The keys are

:lags: the number of lags,
:alpha_level: the alpha level for the test,
:q: the value of the ljung_box_statistic,
:p: the p-value computed, if p is higher than alpha no correlation was detected,
:detected: true if a correlation was found.

1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
# File 'lib/bullshit.rb', line 1309

def detect_autocorrelation(lags = 20, alpha_level = 0.05)
  if q = ljung_box_statistic(lags)
    p = ChiSquareDistribution.new(lags).probability(q)
    return {
      :lags         => lags,
      :alpha_level  => alpha_level,
      :q            => q,
      :p            => p,
      :detected     => p >= 1 - alpha_level,
    }
  end
end

#detect_outliers(factor = 3.0, epsilon = 1E-5) ⇒ Object

Return a result hash with the number of :very_low, :low, :high, and :very_high outliers, determined by the box plotting algorithm run with :median and :iqr parameters. If no outliers were found or the iqr is less than epsilon, nil is returned.


1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
# File 'lib/bullshit.rb', line 1326

def detect_outliers(factor = 3.0, epsilon = 1E-5)
  half_factor = factor / 2.0
  quartile1 = percentile(25)
  quartile3 = percentile(75)
  iqr = quartile3 - quartile1
  iqr < epsilon and return
  result = @measurements.inject(Hash.new(0)) do |h, t|
    extreme =
      case t
      when -Infinity..(quartile1 - factor * iqr)
        :very_low
      when (quartile1 - factor * iqr)..(quartile1 - half_factor * iqr)
        :low
      when (quartile1 + half_factor * iqr)..(quartile3 + factor * iqr)
        :high
      when (quartile3 + factor * iqr)..Infinity
        :very_high
      end and h[extreme] += 1
    h
  end
  unless result.empty?
    result[:median] = median
    result[:iqr] = iqr
    result[:factor] = factor
    result
  end
end

#durbin_watson_statisticObject

Returns the d-value for the Durbin-Watson statistic. The value is d << 2 for positive, d >> 2 for negative and d around 2 for no autocorrelation.


1283
1284
1285
1286
1287
1288
# File 'lib/bullshit.rb', line 1283

def durbin_watson_statistic
  e = linear_regression.residues
  e.size <= 1 and return 2.0
  (1...e.size).inject(0.0) { |s, i| s + (e[i] - e[i - 1]) ** 2 } /
    e.inject(0.0) { |s, x| s + x ** 2 }
end

#geometric_meanObject

Returns the geometric mean of the measurements. If any of the measurements is less than 0.0, this method returns NaN.


1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
# File 'lib/bullshit.rb', line 1127

def geometric_mean
  @geometric_mean ||= (
    sum = @measurements.inject(0.0) { |s, t|
      case
      when t > 0
        s + Math.log(t)
      when t == 0
        break :null
      else
        break nil
      end
    }
    case sum
    when :null
      0.0
    when Float
      Math.exp(sum / size)
    else
      0 / 0.0
    end
  )
end

#harmonic_meanObject

Returns the harmonic mean of the measurements. If any of the measurements is less than or equal to 0.0, this method returns NaN.


1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
# File 'lib/bullshit.rb', line 1112

def harmonic_mean
  @harmonic_mean ||= (
    sum = @measurements.inject(0.0) { |s, t|
      if t > 0
        s + 1.0 / t
      else
        break nil
      end
    }
    sum ? size / sum : 0 / 0.0
  )
end

#histogram(bins) ⇒ Object

Returns a Histogram instance with bins as the number of bins for this analysis' measurements.


1362
1363
1364
# File 'lib/bullshit.rb', line 1362

def histogram(bins)
  Histogram.new(self, bins)
end

#linear_regressionObject

Returns the LinearRegression object for the equation a * x + b which represents the line computed by the linear regression algorithm.


1356
1357
1358
# File 'lib/bullshit.rb', line 1356

def linear_regression
  @linear_regression ||= LinearRegression.new @measurements
end

#ljung_box_statistic(lags = 20) ⇒ Object

Returns the q value of the Ljung-Box statistic for the number of lags lags. A higher value might indicate autocorrelation in the measurements of this Analysis instance. This method returns nil if there weren't enough (at least lags) lags available.


1294
1295
1296
1297
1298
1299
# File 'lib/bullshit.rb', line 1294

def ljung_box_statistic(lags = 20)
  r = autocorrelation
  lags >= r.size and return
  n = size
  n * (n + 2) * (1..lags).inject(0.0) { |s, i| s + r[i] ** 2 / (n - i) }
end

#maxObject

Returns the maximum of the measurements.


1156
1157
1158
# File 'lib/bullshit.rb', line 1156

def max
  @max ||= @measurements.max
end

#minObject

Returns the minimum of the measurements.


1151
1152
1153
# File 'lib/bullshit.rb', line 1151

def min
  @min ||= @measurements.min
end

#percentile(p = 50) ⇒ Object Also known as: median

Returns the p-percentile of the measurements. There are many methods to compute the percentile, this method uses the the weighted average at x_(n + 1)p, which allows p to be in 0…100 (excluding the 100).


1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
# File 'lib/bullshit.rb', line 1164

def percentile(p = 50)
  (0...100).include?(p) or
    raise ArgumentError, "p = #{p}, but has to be in (0...100)"
  p /= 100.0
  @sorted ||= @measurements.sort
  r = p * (@sorted.size + 1)
  r_i = r.to_i
  r_f = r - r_i
  if r_i >= 1
    result = @sorted[r_i - 1]
    if r_i < @sorted.size
      result += r_f * (@sorted[r_i] - @sorted[r_i - 1])
    end
  else
    result = @sorted[0]
  end
  result
end

#sample_standard_deviationObject

Returns the sample standard deviation of the measurements.


1088
1089
1090
# File 'lib/bullshit.rb', line 1088

def sample_standard_deviation
  @sample_standard_deviation ||= Math.sqrt(sample_variance)
end

#sample_standard_deviation_percentageObject

Returns the sample standard deviation of the measurements in percentage of the arithmetic mean.


1094
1095
1096
# File 'lib/bullshit.rb', line 1094

def sample_standard_deviation_percentage
  @sample_standard_deviation_percentage ||= 100.0 * sample_standard_deviation / arithmetic_mean
end

#sample_varianceObject

Returns the sample_variance of the measurements.


1066
1067
1068
# File 'lib/bullshit.rb', line 1066

def sample_variance
  @sample_variance ||= size > 1 ? sum_of_squares / (size - 1.0) : 0.0
end

#sizeObject

Returns the number of measurements, on which the analysis is based.


1056
1057
1058
# File 'lib/bullshit.rb', line 1056

def size
  @measurements.size
end

#standard_deviationObject

Returns the standard deviation of the measurements.


1077
1078
1079
# File 'lib/bullshit.rb', line 1077

def standard_deviation
  @sample_deviation ||= Math.sqrt(variance)
end

#standard_deviation_percentageObject

Returns the standard deviation of the measurements in percentage of the arithmetic mean.


1083
1084
1085
# File 'lib/bullshit.rb', line 1083

def standard_deviation_percentage
  @standard_deviation_percentage ||= 100.0 * standard_deviation / arithmetic_mean
end

#suggested_sample_size(other, alpha = 0.05, beta = 0.05) ⇒ Object

Compute a sample size, that will more likely yield a mean difference between this instance's measurements and those of other. Use alpha and beta as levels for the first- and second-order errors.


1235
1236
1237
1238
1239
1240
1241
1242
1243
# File 'lib/bullshit.rb', line 1235

def suggested_sample_size(other, alpha = 0.05, beta = 0.05)
  alpha, beta = alpha.abs, beta.abs
  signal = arithmetic_mean - other.arithmetic_mean
  df = size + other.size - 2
  pooled_variance_estimate = (sum_of_squares + other.sum_of_squares) / df
  td = TDistribution.new df
  (((td.inverse_probability(alpha) + td.inverse_probability(beta)) *
    Math.sqrt(pooled_variance_estimate)) / signal) ** 2
end

#sumObject

Returns the sum of all measurements.


1099
1100
1101
# File 'lib/bullshit.rb', line 1099

def sum
  @sum ||= @measurements.inject(0.0) { |s, t| s + t }
end

#sum_of_squaresObject

Returns the sum of squares (the sum of the squared deviations) of the measurements.


1072
1073
1074
# File 'lib/bullshit.rb', line 1072

def sum_of_squares
  @sum_of_squares ||= @measurements.inject(0.0) { |s, t| s + (t - arithmetic_mean) ** 2 }
end

#t_student(other) ⇒ Object

Returns the t value of the Student's t-test between this Analysis instance and the other.


1224
1225
1226
1227
1228
1229
1230
# File 'lib/bullshit.rb', line 1224

def t_student(other)
  signal = arithmetic_mean - other.arithmetic_mean
  noise = common_standard_deviation(other) *
    Math.sqrt(size ** -1 + size ** -1)
rescue Errno::EDOM
  0.0
end

#t_welch(other) ⇒ Object

Returns the t value of the Welch's t-test between this Analysis instance and the other.


1195
1196
1197
1198
1199
1200
1201
1202
# File 'lib/bullshit.rb', line 1195

def t_welch(other)
  signal = arithmetic_mean - other.arithmetic_mean
  noise = Math.sqrt(sample_variance / size +
    other.sample_variance / other.size)
  signal / noise
rescue Errno::EDOM
  0.0
end

#varianceObject

Returns the variance of the measurements.


1061
1062
1063
# File 'lib/bullshit.rb', line 1061

def variance
  @variance ||= sum_of_squares / size
end