Class: EvalRuby::Report

Inherits:
Object
  • Object
show all
Defined in:
lib/eval_ruby/report.rb

Overview

Aggregated evaluation report across multiple samples. Provides statistical summaries, filtering, and export functionality.

Examples:

report = EvalRuby.evaluate_batch(dataset)
puts report.summary
report.to_csv("results.csv")

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(results:, samples: [], duration: nil) ⇒ Report

Returns a new instance of Report.

Parameters:

  • results (Array<Result>)
  • samples (Array<Hash>) (defaults to: [])
  • duration (Float, nil) (defaults to: nil)


27
28
29
30
31
# File 'lib/eval_ruby/report.rb', line 27

def initialize(results:, samples: [], duration: nil)
  @results = results
  @samples = samples
  @duration = duration
end

Instance Attribute Details

#durationFloat? (readonly)

Returns total evaluation duration in seconds.

Returns:

  • (Float, nil)

    total evaluation duration in seconds



19
20
21
# File 'lib/eval_ruby/report.rb', line 19

def duration
  @duration
end

#resultsArray<Result> (readonly)

Returns individual evaluation results.

Returns:

  • (Array<Result>)

    individual evaluation results



16
17
18
# File 'lib/eval_ruby/report.rb', line 16

def results
  @results
end

#samplesArray<Hash> (readonly)

Returns original sample data.

Returns:

  • (Array<Hash>)

    original sample data



22
23
24
# File 'lib/eval_ruby/report.rb', line 22

def samples
  @samples
end

Instance Method Details

#failures(threshold: nil) ⇒ Array<Result>

Returns results below the threshold.

Parameters:

  • threshold (Float, nil) (defaults to: nil)

    score threshold (defaults to config default_threshold)

Returns:



75
76
77
78
# File 'lib/eval_ruby/report.rb', line 75

def failures(threshold: nil)
  threshold ||= EvalRuby.configuration.default_threshold
  @results.select { |r| (r.overall || 0.0) < threshold }
end

#metric_statsHash{Symbol => Hash}

Computes per-metric statistics (mean, std, min, max).

Returns:

  • (Hash{Symbol => Hash})

    metric name to stats hash



47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# File 'lib/eval_ruby/report.rb', line 47

def metric_stats
  return {} if @results.empty?

  all_metrics = @results.flat_map { |r| r.scores.keys }.uniq
  all_metrics.each_with_object({}) do |metric, hash|
    values = @results.filter_map { |r| r.scores[metric] }
    next if values.empty?

    mean = values.sum / values.size.to_f
    denominator = values.size > 1 ? (values.size - 1).to_f : 1.0
    variance = values.sum { |v| (v - mean)**2 } / denominator
    std = Math.sqrt(variance)
    hash[metric] = {mean: mean, std: std, min: values.min, max: values.max, count: values.size}
  end
end

#summaryString

Returns human-readable summary with mean and std for each metric.

Returns:

  • (String)

    human-readable summary with mean and std for each metric



34
35
36
37
38
39
40
41
42
# File 'lib/eval_ruby/report.rb', line 34

def summary
  lines = []
  metric_stats.each do |metric, stats|
    lines << format("%-20s %.4f (+/- %.4f)", "#{metric}:", stats[:mean], stats[:std])
  end
  lines << ""
  lines << "Total: #{@results.size} samples | Duration: #{format_duration}"
  lines.join("\n")
end

#to_csv(path) ⇒ void

This method returns an undefined value.

Exports results to CSV.

Parameters:

  • path (String)

    output file path



84
85
86
87
88
89
90
91
92
93
94
95
# File 'lib/eval_ruby/report.rb', line 84

def to_csv(path)
  return if @results.empty?

  all_metrics = @results.flat_map { |r| r.scores.keys }.uniq
  CSV.open(path, "w") do |csv|
    csv << ["sample_index"] + all_metrics.map(&:to_s) + ["overall"]
    @results.each_with_index do |result, i|
      row = [i] + all_metrics.map { |m| result.scores[m]&.round(4) } + [result.overall&.round(4)]
      csv << row
    end
  end
end

#to_json(path) ⇒ void

This method returns an undefined value.

Exports results to JSON.

Parameters:

  • path (String)

    output file path



101
102
103
104
105
106
# File 'lib/eval_ruby/report.rb', line 101

def to_json(path)
  data = @results.each_with_index.map do |result, i|
    {index: i, scores: result.scores, overall: result.overall, sample: @samples[i]}
  end
  File.write(path, JSON.pretty_generate({results: data, summary: metric_stats}))
end

#worst(n = 5) ⇒ Array<Result>

Returns the n worst-scoring results.

Parameters:

  • n (Integer) (defaults to: 5)

    number of results to return

Returns:



67
68
69
# File 'lib/eval_ruby/report.rb', line 67

def worst(n = 5)
  @results.sort_by { |r| r.overall || 0.0 }.first(n)
end