Class: EvalRuby::Evaluator

Inherits:
Object
  • Object
show all
Defined in:
lib/eval_ruby/evaluator.rb

Overview

Runs all configured metrics on a given question/answer/context tuple.

Examples:

evaluator = EvalRuby::Evaluator.new
result = evaluator.evaluate(question: "...", answer: "...", context: [...])

Instance Method Summary collapse

Constructor Details

#initialize(config = EvalRuby.configuration) ⇒ Evaluator

Returns a new instance of Evaluator.

Parameters:

  • config (Configuration) (defaults to: EvalRuby.configuration)

    configuration to use



11
12
13
14
# File 'lib/eval_ruby/evaluator.rb', line 11

def initialize(config = EvalRuby.configuration)
  @config = config
  @judge = build_judge(config)
end

Instance Method Details

#evaluate(question:, answer:, context: [], ground_truth: nil) ⇒ Result

Evaluates an LLM response across quality metrics.

Parameters:

  • question (String)

    the input question

  • answer (String)

    the LLM-generated answer

  • context (Array<String>) (defaults to: [])

    retrieved context chunks

  • ground_truth (String, nil) (defaults to: nil)

    expected correct answer

Returns:



23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# File 'lib/eval_ruby/evaluator.rb', line 23

def evaluate(question:, answer:, context: [], ground_truth: nil)
  scores = {}
  details = {}

  # LLM-as-judge metrics
  faith = Metrics::Faithfulness.new(judge: @judge).call(answer: answer, context: context)
  scores[:faithfulness] = faith[:score]
  details[:faithfulness] = faith[:details]

  rel = Metrics::Relevance.new(judge: @judge).call(question: question, answer: answer)
  scores[:relevance] = rel[:score]
  details[:relevance] = rel[:details]

  cp = Metrics::ContextPrecision.new(judge: @judge).call(question: question, context: context)
  scores[:context_precision] = cp[:score]
  details[:context_precision] = cp[:details]

  if ground_truth
    corr = Metrics::Correctness.new(judge: @judge).call(answer: answer, ground_truth: ground_truth)
    scores[:correctness] = corr[:score]
    details[:correctness] = corr[:details]

    cr = Metrics::ContextRecall.new(judge: @judge).call(context: context, ground_truth: ground_truth)
    scores[:context_recall] = cr[:score]
    details[:context_recall] = cr[:details]
  end

  Result.new(scores: scores, details: details)
end

#evaluate_retrieval(question:, retrieved:, relevant:) ⇒ RetrievalResult

Evaluates retrieval quality using IR metrics.

Parameters:

  • question (String)

    the input question

  • retrieved (Array<String>)

    retrieved document IDs

  • relevant (Array<String>)

    ground-truth relevant document IDs

Returns:



59
60
61
# File 'lib/eval_ruby/evaluator.rb', line 59

def evaluate_retrieval(question:, retrieved:, relevant:)
  RetrievalResult.new(retrieved: retrieved, relevant: relevant)
end