Module: EvalRuby
- Defined in:
- lib/eval_ruby.rb,
lib/eval_ruby/rspec.rb,
lib/eval_ruby/report.rb,
lib/eval_ruby/result.rb,
lib/eval_ruby/dataset.rb,
lib/eval_ruby/version.rb,
lib/eval_ruby/minitest.rb,
lib/eval_ruby/evaluator.rb,
lib/eval_ruby/comparison.rb,
lib/eval_ruby/judges/base.rb,
lib/eval_ruby/metrics/mrr.rb,
lib/eval_ruby/metrics/base.rb,
lib/eval_ruby/metrics/ndcg.rb,
lib/eval_ruby/configuration.rb,
lib/eval_ruby/judges/openai.rb,
lib/eval_ruby/judges/anthropic.rb,
lib/eval_ruby/metrics/relevance.rb,
lib/eval_ruby/metrics/correctness.rb,
lib/eval_ruby/metrics/recall_at_k.rb,
lib/eval_ruby/metrics/faithfulness.rb,
lib/eval_ruby/metrics/context_recall.rb,
lib/eval_ruby/metrics/precision_at_k.rb,
lib/eval_ruby/metrics/context_precision.rb
Overview
Evaluation framework for LLM and RAG applications. Measures quality metrics like faithfulness, relevance, context precision, and answer correctness. Think Ragas or DeepEval for Ruby.
Defined Under Namespace
Modules: Assertions, Judges, Metrics, RSpecMatchers Classes: APIError, Comparison, Configuration, Dataset, Error, Evaluator, InvalidResponseError, Report, Result, RetrievalResult, TimeoutError
Constant Summary collapse
- VERSION =
"0.2.0"
Class Method Summary collapse
-
.compare(report_a, report_b) ⇒ Comparison
Compares two evaluation reports with statistical significance testing.
-
.configuration ⇒ Configuration
The current configuration.
-
.configure {|config| ... } ⇒ void
Yields the configuration for modification.
-
.evaluate(question:, answer:, context: [], ground_truth: nil) ⇒ Result
Evaluates an LLM response across multiple quality metrics.
-
.evaluate_batch(dataset, pipeline: nil) ⇒ Report
Evaluates a batch of samples, optionally running them through a pipeline.
-
.evaluate_retrieval(question:, retrieved:, relevant:) ⇒ RetrievalResult
Evaluates retrieval quality using IR metrics.
-
.reset_configuration! ⇒ Configuration
Resets configuration to defaults.
Class Method Details
.compare(report_a, report_b) ⇒ Comparison
Compares two evaluation reports with statistical significance testing.
134 135 136 |
# File 'lib/eval_ruby.rb', line 134 def compare(report_a, report_b) Comparison.new(report_a, report_b) end |
.configuration ⇒ Configuration
Returns the current configuration.
53 54 55 |
# File 'lib/eval_ruby.rb', line 53 def configuration @configuration ||= Configuration.new end |
.configure {|config| ... } ⇒ void
This method returns an undefined value.
Yields the configuration for modification.
61 62 63 |
# File 'lib/eval_ruby.rb', line 61 def configure yield(configuration) end |
.evaluate(question:, answer:, context: [], ground_truth: nil) ⇒ Result
Evaluates an LLM response across multiple quality metrics.
79 80 81 82 83 84 85 86 |
# File 'lib/eval_ruby.rb', line 79 def evaluate(question:, answer:, context: [], ground_truth: nil) Evaluator.new.evaluate( question: question, answer: answer, context: context, ground_truth: ground_truth ) end |
.evaluate_batch(dataset, pipeline: nil) ⇒ Report
Evaluates a batch of samples, optionally running them through a pipeline.
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
# File 'lib/eval_ruby.rb', line 107 def evaluate_batch(dataset, pipeline: nil) samples = dataset.is_a?(Dataset) ? dataset.samples : dataset evaluator = Evaluator.new start_time = Time.now results = samples.map do |sample| if pipeline response = pipeline.query(sample[:question]) evaluator.evaluate( question: sample[:question], answer: response.respond_to?(:text) ? response.text : response.to_s, context: response.respond_to?(:context) ? response.context : sample[:context], ground_truth: sample[:ground_truth] ) else evaluator.evaluate(**sample.slice(:question, :answer, :context, :ground_truth)) end end Report.new(results: results, samples: samples, duration: Time.now - start_time) end |
.evaluate_retrieval(question:, retrieved:, relevant:) ⇒ RetrievalResult
Evaluates retrieval quality using IR metrics.
94 95 96 97 98 99 100 |
# File 'lib/eval_ruby.rb', line 94 def evaluate_retrieval(question:, retrieved:, relevant:) Evaluator.new.evaluate_retrieval( question: question, retrieved: retrieved, relevant: relevant ) end |
.reset_configuration! ⇒ Configuration
Resets configuration to defaults.
68 69 70 |
# File 'lib/eval_ruby.rb', line 68 def reset_configuration! @configuration = Configuration.new end |