Class: EvalRuby::Metrics::Correctness
- Defined in:
- lib/eval_ruby/metrics/correctness.rb
Overview
Measures factual correctness of an answer against ground truth. Uses LLM judge when available, falls back to token overlap F1 score.
Constant Summary collapse
- PROMPT_TEMPLATE =
<<~PROMPT Given the following answer and ground truth, evaluate whether the answer is factually correct. Answer: %{answer} Ground Truth: %{ground_truth} Evaluate correctness on a scale from 0.0 to 1.0 where: - 1.0 = the answer is completely correct and matches the ground truth - 0.5 = the answer is partially correct - 0.0 = the answer is completely wrong Consider both semantic meaning and factual accuracy, not just exact string matching. Respond in JSON: {"reasoning": "...", "score": 0.0} PROMPT
Instance Attribute Summary
Attributes inherited from Base
Instance Method Summary collapse
-
#call(answer:, ground_truth:, **_kwargs) ⇒ Hash
:score (Float 0.0-1.0) and :details.
Methods inherited from Base
Constructor Details
This class inherits a constructor from EvalRuby::Metrics::Base
Instance Method Details
#call(answer:, ground_truth:, **_kwargs) ⇒ Hash
Returns :score (Float 0.0-1.0) and :details.
39 40 41 42 43 44 45 |
# File 'lib/eval_ruby/metrics/correctness.rb', line 39 def call(answer:, ground_truth:, **_kwargs) if judge llm_score(answer, ground_truth) else string_similarity_score(answer, ground_truth) end end |