Class: EvalRuby::Report
- Inherits:
-
Object
- Object
- EvalRuby::Report
- Defined in:
- lib/eval_ruby/report.rb
Overview
Aggregated evaluation report across multiple samples. Provides statistical summaries, filtering, and export functionality.
Instance Attribute Summary collapse
-
#duration ⇒ Float?
readonly
Total evaluation duration in seconds.
-
#results ⇒ Array<Result>
readonly
Individual evaluation results.
-
#samples ⇒ Array<Hash>
readonly
Original sample data.
Instance Method Summary collapse
-
#failures(threshold: nil) ⇒ Array<Result>
Returns results below the threshold.
-
#initialize(results:, samples: [], duration: nil) ⇒ Report
constructor
A new instance of Report.
-
#metric_stats ⇒ Hash{Symbol => Hash}
Computes per-metric statistics (mean, std, min, max).
-
#summary ⇒ String
Human-readable summary with mean and std for each metric.
-
#to_csv(path) ⇒ void
Exports results to CSV.
-
#to_json(path) ⇒ void
Exports results to JSON.
-
#worst(n = 5) ⇒ Array<Result>
Returns the n worst-scoring results.
Constructor Details
#initialize(results:, samples: [], duration: nil) ⇒ Report
Returns a new instance of Report.
27 28 29 30 31 |
# File 'lib/eval_ruby/report.rb', line 27 def initialize(results:, samples: [], duration: nil) @results = results @samples = samples @duration = duration end |
Instance Attribute Details
#duration ⇒ Float? (readonly)
Returns total evaluation duration in seconds.
19 20 21 |
# File 'lib/eval_ruby/report.rb', line 19 def duration @duration end |
#results ⇒ Array<Result> (readonly)
Returns individual evaluation results.
16 17 18 |
# File 'lib/eval_ruby/report.rb', line 16 def results @results end |
#samples ⇒ Array<Hash> (readonly)
Returns original sample data.
22 23 24 |
# File 'lib/eval_ruby/report.rb', line 22 def samples @samples end |
Instance Method Details
#failures(threshold: nil) ⇒ Array<Result>
Returns results below the threshold.
75 76 77 78 |
# File 'lib/eval_ruby/report.rb', line 75 def failures(threshold: nil) threshold ||= EvalRuby.configuration.default_threshold @results.select { |r| (r.overall || 0.0) < threshold } end |
#metric_stats ⇒ Hash{Symbol => Hash}
Computes per-metric statistics (mean, std, min, max).
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
# File 'lib/eval_ruby/report.rb', line 47 def metric_stats return {} if @results.empty? all_metrics = @results.flat_map { |r| r.scores.keys }.uniq all_metrics.each_with_object({}) do |metric, hash| values = @results.filter_map { |r| r.scores[metric] } next if values.empty? mean = values.sum / values.size.to_f denominator = values.size > 1 ? (values.size - 1).to_f : 1.0 variance = values.sum { |v| (v - mean)**2 } / denominator std = Math.sqrt(variance) hash[metric] = {mean: mean, std: std, min: values.min, max: values.max, count: values.size} end end |
#summary ⇒ String
Returns human-readable summary with mean and std for each metric.
34 35 36 37 38 39 40 41 42 |
# File 'lib/eval_ruby/report.rb', line 34 def summary lines = [] metric_stats.each do |metric, stats| lines << format("%-20s %.4f (+/- %.4f)", "#{metric}:", stats[:mean], stats[:std]) end lines << "" lines << "Total: #{@results.size} samples | Duration: #{format_duration}" lines.join("\n") end |
#to_csv(path) ⇒ void
This method returns an undefined value.
Exports results to CSV.
84 85 86 87 88 89 90 91 92 93 94 95 |
# File 'lib/eval_ruby/report.rb', line 84 def to_csv(path) return if @results.empty? all_metrics = @results.flat_map { |r| r.scores.keys }.uniq CSV.open(path, "w") do |csv| csv << ["sample_index"] + all_metrics.map(&:to_s) + ["overall"] @results.each_with_index do |result, i| row = [i] + all_metrics.map { |m| result.scores[m]&.round(4) } + [result.overall&.round(4)] csv << row end end end |
#to_json(path) ⇒ void
This method returns an undefined value.
Exports results to JSON.
101 102 103 104 105 106 |
# File 'lib/eval_ruby/report.rb', line 101 def to_json(path) data = @results.each_with_index.map do |result, i| {index: i, scores: result.scores, overall: result.overall, sample: @samples[i]} end File.write(path, JSON.pretty_generate({results: data, summary: metric_stats})) end |
#worst(n = 5) ⇒ Array<Result>
Returns the n worst-scoring results.
67 68 69 |
# File 'lib/eval_ruby/report.rb', line 67 def worst(n = 5) @results.sort_by { |r| r.overall || 0.0 }.first(n) end |