Class: EvalRuby::Comparison
- Inherits:
-
Object
- Object
- EvalRuby::Comparison
- Defined in:
- lib/eval_ruby/comparison.rb
Overview
Statistical comparison of two evaluation reports using paired t-tests.
Instance Attribute Summary collapse
-
#report_a ⇒ Report
readonly
Baseline report.
-
#report_b ⇒ Report
readonly
Comparison report.
Instance Method Summary collapse
-
#initialize(report_a, report_b) ⇒ Comparison
constructor
A new instance of Comparison.
-
#significant_improvements(alpha: 0.05) ⇒ Array<Symbol>
Returns metrics where report_b is significantly better than report_a.
-
#summary ⇒ String
Formatted comparison table with deltas and p-values.
Constructor Details
#initialize(report_a, report_b) ⇒ Comparison
Returns a new instance of Comparison.
19 20 21 22 |
# File 'lib/eval_ruby/comparison.rb', line 19 def initialize(report_a, report_b) @report_a = report_a @report_b = report_b end |
Instance Attribute Details
#report_a ⇒ Report (readonly)
Returns baseline report.
12 13 14 |
# File 'lib/eval_ruby/comparison.rb', line 12 def report_a @report_a end |
#report_b ⇒ Report (readonly)
Returns comparison report.
15 16 17 |
# File 'lib/eval_ruby/comparison.rb', line 15 def report_b @report_b end |
Instance Method Details
#significant_improvements(alpha: 0.05) ⇒ Array<Symbol>
Returns metrics where report_b is significantly better than report_a.
55 56 57 58 59 60 61 62 63 64 65 66 |
# File 'lib/eval_ruby/comparison.rb', line 55 def significant_improvements(alpha: 0.05) all_metrics.select do |metric| scores_a = @report_a.results.filter_map { |r| r.scores[metric] } scores_b = @report_b.results.filter_map { |r| r.scores[metric] } next false if scores_a.empty? || scores_b.empty? t_result = paired_t_test(scores_a, scores_b) mean_b = scores_b.sum / scores_b.size.to_f mean_a = scores_a.sum / scores_a.size.to_f t_result[:p_value] < alpha && mean_b > mean_a end end |
#summary ⇒ String
Returns formatted comparison table with deltas and p-values.
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
# File 'lib/eval_ruby/comparison.rb', line 25 def summary lines = [ format("%-20s | %-10s | %-10s | %-8s | %s", "Metric", "A", "B", "Delta", "p-value"), "-" * 70 ] all_metrics.each do |metric| stats_a = @report_a.metric_stats[metric] stats_b = @report_b.metric_stats[metric] next unless stats_a && stats_b delta = stats_b[:mean] - stats_a[:mean] scores_a = @report_a.results.filter_map { |r| r.scores[metric] } scores_b = @report_b.results.filter_map { |r| r.scores[metric] } t_result = paired_t_test(scores_a, scores_b) sig = significance_marker(t_result[:p_value]) lines << format( "%-20s | %-10.4f | %-10.4f | %+.4f | %.4f %s", metric, stats_a[:mean], stats_b[:mean], delta, t_result[:p_value], sig ) end lines.join("\n") end |