Class: EvalRuby::Metrics::Faithfulness
- Defined in:
- lib/eval_ruby/metrics/faithfulness.rb
Overview
Measures whether an answer is supported by the provided context. Uses an LLM judge to identify claims and check if each is supported.
Constant Summary collapse
- PROMPT_TEMPLATE =
<<~PROMPT Given the following context and answer, evaluate whether the answer is faithful to (supported by) the context. For each claim in the answer, determine if it is: 1. SUPPORTED - directly supported by the context 2. NOT SUPPORTED - contradicts or is not mentioned in the context Context: %{context} Answer: %{answer} List each claim and whether it is SUPPORTED or NOT SUPPORTED. Then give a faithfulness score from 0.0 to 1.0 where: - 1.0 = all claims are supported - 0.0 = no claims are supported Respond in JSON: {"claims": [{"claim": "...", "supported": true}], "score": 0.0} PROMPT
Instance Attribute Summary
Attributes inherited from Base
Instance Method Summary collapse
-
#call(answer:, context:, **_kwargs) ⇒ Hash
:score (Float 0.0-1.0) and :details (:claims Array).
Methods inherited from Base
Constructor Details
This class inherits a constructor from EvalRuby::Metrics::Base
Instance Method Details
#call(answer:, context:, **_kwargs) ⇒ Hash
Returns :score (Float 0.0-1.0) and :details (:claims Array).
38 39 40 41 42 43 44 45 46 47 48 49 |
# File 'lib/eval_ruby/metrics/faithfulness.rb', line 38 def call(answer:, context:, **_kwargs) context_text = context.is_a?(Array) ? context.join("\n\n") : context.to_s prompt = format(PROMPT_TEMPLATE, context: context_text, answer: answer) result = judge.call(prompt) raise Error, "Judge returned invalid response for faithfulness" unless result&.key?("score") { score: result["score"].to_f.clamp(0.0, 1.0), details: {claims: result["claims"] || []} } end |