Class: Bullshit::Analysis
- Inherits:
-
Object
- Object
- Bullshit::Analysis
- Defined in:
- lib/bullshit.rb
Overview
This class is used to analyse the time measurements and compute their statistics.
Instance Attribute Summary collapse
-
#measurements ⇒ Object
readonly
Returns the array of measurements.
Instance Method Summary collapse
-
#arithmetic_mean ⇒ Object
(also: #mean)
Returns the arithmetic mean of the measurements.
-
#autocorrelation ⇒ Object
Returns the array of autocorrelation values c_k / c_0 (of length size - 1).
-
#autovariance ⇒ Object
Returns the array of autovariances (of length size - 1).
-
#common_standard_deviation(other) ⇒ Object
Returns an estimation of the common standard deviation of the measurements of this and
other
. -
#common_variance(other) ⇒ Object
Returns an estimation of the common variance of the measurements of this and
other
. -
#compute_student_df(other) ⇒ Object
Compute the # degrees of freedom for Student’s t-test.
-
#compute_welch_df(other) ⇒ Object
Use an approximation of the Welch-Satterthwaite equation to compute the degrees of freedom for Welch’s t-test.
-
#confidence_interval(alpha = 0.05) ⇒ Object
Return the confidence interval for the arithmetic mean with alpha level
alpha
of the measurements of this Analysis instance as a Range object. -
#cover?(other, alpha = 0.05) ⇒ Boolean
Return true, if the Analysis instance covers the
other
, that is their arithmetic mean value is most likely to be equal for thealpha
error level. -
#detect_autocorrelation(lags = 20, alpha_level = 0.05) ⇒ Object
This method tries to detect autocorrelation with the Ljung-Box statistic.
-
#detect_outliers(factor = 3.0, epsilon = 1E-5) ⇒ Object
Return a result hash with the number of :very_low, :low, :high, and :very_high outliers, determined by the box plotting algorithm run with :median and :iqr parameters.
-
#durbin_watson_statistic ⇒ Object
Returns the d-value for the Durbin-Watson statistic.
-
#geometric_mean ⇒ Object
Returns the geometric mean of the measurements.
-
#harmonic_mean ⇒ Object
Returns the harmonic mean of the measurements.
-
#histogram(bins) ⇒ Object
Returns a Histogram instance with
bins
as the number of bins for this analysis’ measurements. -
#initialize(measurements) ⇒ Analysis
constructor
A new instance of Analysis.
-
#linear_regression ⇒ Object
Returns the LinearRegression object for the equation a * x + b which represents the line computed by the linear regression algorithm.
-
#ljung_box_statistic(lags = 20) ⇒ Object
Returns the q value of the Ljung-Box statistic for the number of lags
lags
. -
#max ⇒ Object
Returns the maximum of the measurements.
-
#min ⇒ Object
Returns the minimum of the measurements.
-
#percentile(p = 50) ⇒ Object
(also: #median)
Returns the
p
-percentile of the measurements. -
#sample_standard_deviation ⇒ Object
Returns the sample standard deviation of the measurements.
-
#sample_standard_deviation_percentage ⇒ Object
Returns the sample standard deviation of the measurements in percentage of the arithmetic mean.
-
#sample_variance ⇒ Object
Returns the sample_variance of the measurements.
-
#size ⇒ Object
Returns the number of measurements, on which the analysis is based.
-
#standard_deviation ⇒ Object
Returns the standard deviation of the measurements.
-
#standard_deviation_percentage ⇒ Object
Returns the standard deviation of the measurements in percentage of the arithmetic mean.
-
#suggested_sample_size(other, alpha = 0.05, beta = 0.05) ⇒ Object
Compute a sample size, that will more likely yield a mean difference between this instance’s measurements and those of
other
. -
#sum ⇒ Object
Returns the sum of all measurements.
-
#sum_of_squares ⇒ Object
Returns the sum of squares (the sum of the squared deviations) of the measurements.
-
#t_student(other) ⇒ Object
Returns the t value of the Student’s t-test between this Analysis instance and the
other
. -
#t_welch(other) ⇒ Object
Returns the t value of the Welch’s t-test between this Analysis instance and the
other
. -
#variance ⇒ Object
Returns the variance of the measurements.
Constructor Details
#initialize(measurements) ⇒ Analysis
Returns a new instance of Analysis.
1047 1048 1049 1050 |
# File 'lib/bullshit.rb', line 1047 def initialize(measurements) @measurements = measurements @measurements.freeze end |
Instance Attribute Details
#measurements ⇒ Object (readonly)
Returns the array of measurements.
1053 1054 1055 |
# File 'lib/bullshit.rb', line 1053 def measurements @measurements end |
Instance Method Details
#arithmetic_mean ⇒ Object Also known as: mean
Returns the arithmetic mean of the measurements.
1104 1105 1106 |
# File 'lib/bullshit.rb', line 1104 def arithmetic_mean @arithmetic_mean ||= sum / size end |
#autocorrelation ⇒ Object
Returns the array of autocorrelation values c_k / c_0 (of length size - 1).
1276 1277 1278 1279 |
# File 'lib/bullshit.rb', line 1276 def autocorrelation c = autovariance Array.new(c.size) { |k| c[k] / c[0] } end |
#autovariance ⇒ Object
Returns the array of autovariances (of length size - 1).
1264 1265 1266 1267 1268 1269 1270 1271 1272 |
# File 'lib/bullshit.rb', line 1264 def autovariance Array.new(size - 1) do |k| s = 0.0 0.upto(size - k - 1) do |i| s += (@measurements[i] - arithmetic_mean) * (@measurements[i + k] - arithmetic_mean) end s / size end end |
#common_standard_deviation(other) ⇒ Object
Returns an estimation of the common standard deviation of the measurements of this and other
.
1206 1207 1208 |
# File 'lib/bullshit.rb', line 1206 def common_standard_deviation(other) Math.sqrt(common_variance(other)) end |
#common_variance(other) ⇒ Object
Returns an estimation of the common variance of the measurements of this and other
.
1212 1213 1214 1215 |
# File 'lib/bullshit.rb', line 1212 def common_variance(other) (size - 1) * sample_variance + (other.size - 1) * other.sample_variance / (size + other.size - 2) end |
#compute_student_df(other) ⇒ Object
Compute the # degrees of freedom for Student’s t-test.
1218 1219 1220 |
# File 'lib/bullshit.rb', line 1218 def compute_student_df(other) size + other.size - 2 end |
#compute_welch_df(other) ⇒ Object
Use an approximation of the Welch-Satterthwaite equation to compute the degrees of freedom for Welch’s t-test.
1187 1188 1189 1190 1191 |
# File 'lib/bullshit.rb', line 1187 def compute_welch_df(other) (sample_variance / size + other.sample_variance / other.size) ** 2 / ( (sample_variance ** 2 / (size ** 2 * (size - 1))) + (other.sample_variance ** 2 / (other.size ** 2 * (other.size - 1)))) end |
#confidence_interval(alpha = 0.05) ⇒ Object
Return the confidence interval for the arithmetic mean with alpha level alpha
of the measurements of this Analysis instance as a Range object.
1256 1257 1258 1259 1260 1261 |
# File 'lib/bullshit.rb', line 1256 def confidence_interval(alpha = 0.05) td = TDistribution.new(size - 1) t = td.inverse_probability(alpha / 2).abs delta = t * sample_standard_deviation / Math.sqrt(size) (arithmetic_mean - delta)..(arithmetic_mean + delta) end |
#cover?(other, alpha = 0.05) ⇒ Boolean
Return true, if the Analysis instance covers the other
, that is their arithmetic mean value is most likely to be equal for the alpha
error level.
1248 1249 1250 1251 1252 |
# File 'lib/bullshit.rb', line 1248 def cover?(other, alpha = 0.05) t = t_welch(other) td = TDistribution.new(compute_welch_df(other)) t.abs < td.inverse_probability(1 - alpha.abs / 2.0) end |
#detect_autocorrelation(lags = 20, alpha_level = 0.05) ⇒ Object
This method tries to detect autocorrelation with the Ljung-Box statistic. If enough lags can be considered it returns a hash with results, otherwise nil is returned. The keys are
:lags: the number of lags,
:alpha_level: the alpha level for the test,
:q: the value of the ljung_box_statistic,
:p: the p-value computed, if p is higher than alpha no correlation was detected,
:detected: true if a correlation was found.
1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 |
# File 'lib/bullshit.rb', line 1309 def detect_autocorrelation(lags = 20, alpha_level = 0.05) if q = ljung_box_statistic(lags) p = ChiSquareDistribution.new(lags).probability(q) return { :lags => lags, :alpha_level => alpha_level, :q => q, :p => p, :detected => p >= 1 - alpha_level, } end end |
#detect_outliers(factor = 3.0, epsilon = 1E-5) ⇒ Object
Return a result hash with the number of :very_low, :low, :high, and :very_high outliers, determined by the box plotting algorithm run with :median and :iqr parameters. If no outliers were found or the iqr is less than epsilon, nil is returned.
1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 |
# File 'lib/bullshit.rb', line 1326 def detect_outliers(factor = 3.0, epsilon = 1E-5) half_factor = factor / 2.0 quartile1 = percentile(25) quartile3 = percentile(75) iqr = quartile3 - quartile1 iqr < epsilon and return result = @measurements.inject(Hash.new(0)) do |h, t| extreme = case t when -Infinity..(quartile1 - factor * iqr) :very_low when (quartile1 - factor * iqr)..(quartile1 - half_factor * iqr) :low when (quartile1 + half_factor * iqr)..(quartile3 + factor * iqr) :high when (quartile3 + factor * iqr)..Infinity :very_high end and h[extreme] += 1 h end unless result.empty? result[:median] = median result[:iqr] = iqr result[:factor] = factor result end end |
#durbin_watson_statistic ⇒ Object
Returns the d-value for the Durbin-Watson statistic. The value is d << 2 for positive, d >> 2 for negative and d around 2 for no autocorrelation.
1283 1284 1285 1286 1287 1288 |
# File 'lib/bullshit.rb', line 1283 def durbin_watson_statistic e = linear_regression.residues e.size <= 1 and return 2.0 (1...e.size).inject(0.0) { |s, i| s + (e[i] - e[i - 1]) ** 2 } / e.inject(0.0) { |s, x| s + x ** 2 } end |
#geometric_mean ⇒ Object
Returns the geometric mean of the measurements. If any of the measurements is less than 0.0, this method returns NaN.
1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 |
# File 'lib/bullshit.rb', line 1127 def geometric_mean @geometric_mean ||= ( sum = @measurements.inject(0.0) { |s, t| case when t > 0 s + Math.log(t) when t == 0 break :null else break nil end } case sum when :null 0.0 when Float Math.exp(sum / size) else 0 / 0.0 end ) end |
#harmonic_mean ⇒ Object
Returns the harmonic mean of the measurements. If any of the measurements is less than or equal to 0.0, this method returns NaN.
1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 |
# File 'lib/bullshit.rb', line 1112 def harmonic_mean @harmonic_mean ||= ( sum = @measurements.inject(0.0) { |s, t| if t > 0 s + 1.0 / t else break nil end } sum ? size / sum : 0 / 0.0 ) end |
#histogram(bins) ⇒ Object
Returns a Histogram instance with bins
as the number of bins for this analysis’ measurements.
1362 1363 1364 |
# File 'lib/bullshit.rb', line 1362 def histogram(bins) Histogram.new(self, bins) end |
#linear_regression ⇒ Object
Returns the LinearRegression object for the equation a * x + b which represents the line computed by the linear regression algorithm.
1356 1357 1358 |
# File 'lib/bullshit.rb', line 1356 def linear_regression @linear_regression ||= LinearRegression.new @measurements end |
#ljung_box_statistic(lags = 20) ⇒ Object
Returns the q value of the Ljung-Box statistic for the number of lags lags
. A higher value might indicate autocorrelation in the measurements of this Analysis instance. This method returns nil if there weren’t enough (at least lags) lags available.
1294 1295 1296 1297 1298 1299 |
# File 'lib/bullshit.rb', line 1294 def ljung_box_statistic(lags = 20) r = autocorrelation lags >= r.size and return n = size n * (n + 2) * (1..lags).inject(0.0) { |s, i| s + r[i] ** 2 / (n - i) } end |
#max ⇒ Object
Returns the maximum of the measurements.
1156 1157 1158 |
# File 'lib/bullshit.rb', line 1156 def max @max ||= @measurements.max end |
#min ⇒ Object
Returns the minimum of the measurements.
1151 1152 1153 |
# File 'lib/bullshit.rb', line 1151 def min @min ||= @measurements.min end |
#percentile(p = 50) ⇒ Object Also known as: median
Returns the p
-percentile of the measurements. There are many methods to compute the percentile, this method uses the the weighted average at x_(n + 1)p, which allows p to be in 0…100 (excluding the 100).
1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 |
# File 'lib/bullshit.rb', line 1164 def percentile(p = 50) (0...100).include?(p) or raise ArgumentError, "p = #{p}, but has to be in (0...100)" p /= 100.0 @sorted ||= @measurements.sort r = p * (@sorted.size + 1) r_i = r.to_i r_f = r - r_i if r_i >= 1 result = @sorted[r_i - 1] if r_i < @sorted.size result += r_f * (@sorted[r_i] - @sorted[r_i - 1]) end else result = @sorted[0] end result end |
#sample_standard_deviation ⇒ Object
Returns the sample standard deviation of the measurements.
1088 1089 1090 |
# File 'lib/bullshit.rb', line 1088 def sample_standard_deviation @sample_standard_deviation ||= Math.sqrt(sample_variance) end |
#sample_standard_deviation_percentage ⇒ Object
Returns the sample standard deviation of the measurements in percentage of the arithmetic mean.
1094 1095 1096 |
# File 'lib/bullshit.rb', line 1094 def sample_standard_deviation_percentage @sample_standard_deviation_percentage ||= 100.0 * sample_standard_deviation / arithmetic_mean end |
#sample_variance ⇒ Object
Returns the sample_variance of the measurements.
1066 1067 1068 |
# File 'lib/bullshit.rb', line 1066 def sample_variance @sample_variance ||= size > 1 ? sum_of_squares / (size - 1.0) : 0.0 end |
#size ⇒ Object
Returns the number of measurements, on which the analysis is based.
1056 1057 1058 |
# File 'lib/bullshit.rb', line 1056 def size @measurements.size end |
#standard_deviation ⇒ Object
Returns the standard deviation of the measurements.
1077 1078 1079 |
# File 'lib/bullshit.rb', line 1077 def standard_deviation @sample_deviation ||= Math.sqrt(variance) end |
#standard_deviation_percentage ⇒ Object
Returns the standard deviation of the measurements in percentage of the arithmetic mean.
1083 1084 1085 |
# File 'lib/bullshit.rb', line 1083 def standard_deviation_percentage @standard_deviation_percentage ||= 100.0 * standard_deviation / arithmetic_mean end |
#suggested_sample_size(other, alpha = 0.05, beta = 0.05) ⇒ Object
Compute a sample size, that will more likely yield a mean difference between this instance’s measurements and those of other
. Use alpha
and beta
as levels for the first- and second-order errors.
1235 1236 1237 1238 1239 1240 1241 1242 1243 |
# File 'lib/bullshit.rb', line 1235 def suggested_sample_size(other, alpha = 0.05, beta = 0.05) alpha, beta = alpha.abs, beta.abs signal = arithmetic_mean - other.arithmetic_mean df = size + other.size - 2 pooled_variance_estimate = (sum_of_squares + other.sum_of_squares) / df td = TDistribution.new df (((td.inverse_probability(alpha) + td.inverse_probability(beta)) * Math.sqrt(pooled_variance_estimate)) / signal) ** 2 end |
#sum ⇒ Object
Returns the sum of all measurements.
1099 1100 1101 |
# File 'lib/bullshit.rb', line 1099 def sum @sum ||= @measurements.inject(0.0) { |s, t| s + t } end |
#sum_of_squares ⇒ Object
Returns the sum of squares (the sum of the squared deviations) of the measurements.
1072 1073 1074 |
# File 'lib/bullshit.rb', line 1072 def sum_of_squares @sum_of_squares ||= @measurements.inject(0.0) { |s, t| s + (t - arithmetic_mean) ** 2 } end |
#t_student(other) ⇒ Object
Returns the t value of the Student’s t-test between this Analysis instance and the other
.
1224 1225 1226 1227 1228 1229 1230 |
# File 'lib/bullshit.rb', line 1224 def t_student(other) signal = arithmetic_mean - other.arithmetic_mean noise = common_standard_deviation(other) * Math.sqrt(size ** -1 + size ** -1) rescue Errno::EDOM 0.0 end |
#t_welch(other) ⇒ Object
Returns the t value of the Welch’s t-test between this Analysis instance and the other
.
1195 1196 1197 1198 1199 1200 1201 1202 |
# File 'lib/bullshit.rb', line 1195 def t_welch(other) signal = arithmetic_mean - other.arithmetic_mean noise = Math.sqrt(sample_variance / size + other.sample_variance / other.size) signal / noise rescue Errno::EDOM 0.0 end |
#variance ⇒ Object
Returns the variance of the measurements.
1061 1062 1063 |
# File 'lib/bullshit.rb', line 1061 def variance @variance ||= sum_of_squares / size end |