Class: EvalRuby::Report

Inherits:

Object

Object
EvalRuby::Report

show all

Defined in:: lib/eval_ruby/report.rb

Overview

Aggregated evaluation report across multiple samples. Provides statistical summaries, filtering, and export functionality.

Examples:

report = EvalRuby.evaluate_batch(dataset)
puts report.summary
report.to_csv("results.csv")

Instance Attribute Summary collapse

#duration ⇒ Float^? readonly

Total evaluation duration in seconds.
#results ⇒ Array<Result> readonly

Individual evaluation results.
#samples ⇒ Array<Hash> readonly

Original sample data.

Instance Method Summary collapse

#failures(threshold: nil) ⇒ Array<Result>

Returns results below the threshold.
#initialize(results:, samples: [], duration: nil) ⇒ Report constructor

A new instance of Report.
#metric_stats ⇒ Hash{Symbol => Hash}

Computes per-metric statistics (mean, std, min, max).
#summary ⇒ String

Human-readable summary with mean and std for each metric.
#to_csv(path) ⇒ void

Exports results to CSV.
#to_json(path) ⇒ void

Exports results to JSON.
#worst(n = 5) ⇒ Array<Result>

Returns the n worst-scoring results.

Constructor Details

#initialize(results:, samples: [], duration: nil) ⇒ `Report`

Returns a new instance of Report.

Parameters:

results (Array<Result>)
samples (Array<Hash>) (defaults to: [])
duration (Float, nil) (defaults to: nil)

# File 'lib/eval_ruby/report.rb', line 27

def initialize(results:, samples: [], duration: nil)
  @results = results
  @samples = samples
  @duration = duration
end

Instance Attribute Details

#duration ⇒ `Float`^? (readonly)

Returns total evaluation duration in seconds.

Returns:

(Float, nil) —

total evaluation duration in seconds



19
20
21

# File 'lib/eval_ruby/report.rb', line 19

def duration
  @duration
end

#results ⇒ `Array<Result>` (readonly)

Returns individual evaluation results.

Returns:

(Array<Result>) —

individual evaluation results



16
17
18

# File 'lib/eval_ruby/report.rb', line 16

def results
  @results
end

#samples ⇒ `Array<Hash>` (readonly)

Returns original sample data.

Returns:

(Array<Hash>) —

original sample data



22
23
24

# File 'lib/eval_ruby/report.rb', line 22

def samples
  @samples
end

Instance Method Details

#failures(threshold: nil) ⇒ `Array<Result>`

Returns results below the threshold.

Parameters:

threshold (Float, nil) (defaults to: nil) —

score threshold (defaults to config default_threshold)

Returns:

(Array<Result>)

# File 'lib/eval_ruby/report.rb', line 75

def failures(threshold: nil)
  threshold ||= EvalRuby.configuration.default_threshold
  @results.select { |r| (r.overall || 0.0) < threshold }
end

#metric_stats ⇒ `Hash{Symbol => Hash}`

Computes per-metric statistics (mean, std, min, max).

Returns:

(Hash{Symbol => Hash}) —

metric name to stats hash

# File 'lib/eval_ruby/report.rb', line 47

def metric_stats
  return {} if @results.empty?

  all_metrics = @results.flat_map { |r| r.scores.keys }.uniq
  all_metrics.each_with_object({}) do |metric, hash|
    values = @results.filter_map { |r| r.scores[metric] }
    next if values.empty?

    mean = values.sum / values.size.to_f
    denominator = values.size > 1 ? (values.size - 1).to_f : 1.0
    variance = values.sum { |v| (v - mean)**2 } / denominator
    std = Math.sqrt(variance)
    hash[metric] = {mean: mean, std: std, min: values.min, max: values.max, count: values.size}
  end
end

#summary ⇒ `String`

Returns human-readable summary with mean and std for each metric.

Returns:

(String) —

human-readable summary with mean and std for each metric

# File 'lib/eval_ruby/report.rb', line 34

def summary
  lines = []
  metric_stats.each do |metric, stats|
    lines << format("%-20s %.4f (+/- %.4f)", "#{metric}:", stats[:mean], stats[:std])
  end
  lines << ""
  lines << "Total: #{@results.size} samples | Duration: #{format_duration}"
  lines.join("\n")
end

#to_csv(path) ⇒ `void`

This method returns an undefined value.

Exports results to CSV.

Parameters:

path (String) —

output file path

# File 'lib/eval_ruby/report.rb', line 84

def to_csv(path)
  return if @results.empty?

  all_metrics = @results.flat_map { |r| r.scores.keys }.uniq
  CSV.open(path, "w") do |csv|
    csv << ["sample_index"] + all_metrics.map(&:to_s) + ["overall"]
    @results.each_with_index do |result, i|
      row = [i] + all_metrics.map { |m| result.scores[m]&.round(4) } + [result.overall&.round(4)]
      csv << row
    end
  end
end

#to_json(path) ⇒ `void`

This method returns an undefined value.

Exports results to JSON.

Parameters:

path (String) —

output file path

# File 'lib/eval_ruby/report.rb', line 101

def to_json(path)
  data = @results.each_with_index.map do |result, i|
    {index: i, scores: result.scores, overall: result.overall, sample: @samples[i]}
  end
  File.write(path, JSON.pretty_generate({results: data, summary: metric_stats}))
end

#worst(n = 5) ⇒ `Array<Result>`

Returns the n worst-scoring results.

Parameters:

n (Integer) (defaults to: 5) —

number of results to return

Returns:

(Array<Result>)



67
68
69

# File 'lib/eval_ruby/report.rb', line 67

def worst(n = 5)
  @results.sort_by { |r| r.overall || 0.0 }.first(n)
end

Class: EvalRuby::Report

Overview

Examples:

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(results:, samples: [], duration: nil) ⇒ Report

Instance Attribute Details

#duration ⇒ Float? (readonly)

#results ⇒ Array<Result> (readonly)

#samples ⇒ Array<Hash> (readonly)

Instance Method Details

#failures(threshold: nil) ⇒ Array<Result>

#metric_stats ⇒ Hash{Symbol => Hash}

#summary ⇒ String

#to_csv(path) ⇒ void

#to_json(path) ⇒ void

#worst(n = 5) ⇒ Array<Result>

#initialize(results:, samples: [], duration: nil) ⇒ `Report`

#duration ⇒ `Float`^? (readonly)

#results ⇒ `Array<Result>` (readonly)

#samples ⇒ `Array<Hash>` (readonly)

#failures(threshold: nil) ⇒ `Array<Result>`

#metric_stats ⇒ `Hash{Symbol => Hash}`

#summary ⇒ `String`

#to_csv(path) ⇒ `void`

#to_json(path) ⇒ `void`

#worst(n = 5) ⇒ `Array<Result>`