Class: EvalRuby::Metrics::Relevance

Inherits:

Base

Object
Base
EvalRuby::Metrics::Relevance

show all

Defined in:: lib/eval_ruby/metrics/relevance.rb

Overview

Measures whether an answer is relevant to the question. Uses an LLM judge to evaluate relevance on a 0.0-1.0 scale.

Examples:

metric = Relevance.new(judge: judge)
result = metric.call(question: "What is Ruby?", answer: "Ruby is a language.")
result[:score] # => 0.95

Constant Summary collapse

PROMPT_TEMPLATE =

<<~PROMPT
  Given the following question and answer, evaluate whether the answer
  is relevant to and addresses the question.

  Question:
  %{question}

  Answer:
  %{answer}

  Evaluate relevance on a scale from 0.0 to 1.0 where:
  - 1.0 = the answer fully and directly addresses the question
  - 0.5 = the answer partially addresses the question
  - 0.0 = the answer is completely irrelevant to the question

  Respond in JSON: {"reasoning": "...", "score": 0.0}
PROMPT

Instance Attribute Summary

Attributes inherited from Base

#judge

Instance Method Summary collapse

#call(question:, answer:, **_kwargs) ⇒ Hash

:score (Float 0.0-1.0) and :details (:reasoning String).

Methods inherited from Base

#initialize

Constructor Details

This class inherits a constructor from EvalRuby::Metrics::Base

Instance Method Details

#call(question:, answer:, **_kwargs) ⇒ `Hash`

Returns :score (Float 0.0-1.0) and :details (:reasoning String).

Parameters:

question (String) —

the input question
answer (String) —

the LLM-generated answer

Returns:

(Hash) —

:score (Float 0.0-1.0) and :details (:reasoning String)

Raises:

(Error)

# File 'lib/eval_ruby/metrics/relevance.rb', line 34

def call(question:, answer:, **_kwargs)
  prompt = format(PROMPT_TEMPLATE, question: question, answer: answer)

  result = judge.call(prompt)
  raise Error, "Judge returned invalid response for relevance" unless result&.key?("score")

  {
    score: result["score"].to_f.clamp(0.0, 1.0),
    details: {reasoning: result["reasoning"]}
  }
end