Class: EvalRuby::Metrics::ContextPrecision
- Defined in:
- lib/eval_ruby/metrics/context_precision.rb
Overview
Measures the proportion of retrieved contexts that are relevant to the question. Uses an LLM judge to evaluate each context’s relevance.
Constant Summary collapse
- PROMPT_TEMPLATE =
<<~PROMPT Given the following question and a list of retrieved contexts, evaluate whether each context is relevant to answering the question. Question: %{question} Contexts: %{contexts} For each context, determine if it is RELEVANT or NOT RELEVANT to answering the question. Respond in JSON: {"evaluations": [{"index": 0, "relevant": true}], "score": 0.0} The score should be the proportion of relevant contexts (0.0 to 1.0). PROMPT
Instance Attribute Summary
Attributes inherited from Base
Instance Method Summary collapse
-
#call(question:, context:, **_kwargs) ⇒ Hash
:score (Float 0.0-1.0) and :details (:evaluations Array).
Methods inherited from Base
Constructor Details
This class inherits a constructor from EvalRuby::Metrics::Base
Instance Method Details
#call(question:, context:, **_kwargs) ⇒ Hash
Returns :score (Float 0.0-1.0) and :details (:evaluations Array).
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
# File 'lib/eval_ruby/metrics/context_precision.rb', line 32 def call(question:, context:, **_kwargs) contexts = context.is_a?(Array) ? context : [context.to_s] return {score: 0.0, details: {}} if contexts.empty? contexts_text = contexts.each_with_index.map { |c, i| "[#{i}] #{c}" }.join("\n\n") prompt = format(PROMPT_TEMPLATE, question: question, contexts: contexts_text) result = judge.call(prompt) raise Error, "Judge returned invalid response for context_precision" unless result&.key?("score") { score: result["score"].to_f.clamp(0.0, 1.0), details: {evaluations: result["evaluations"] || []} } end |