Class: EvalRuby::Dataset
- Inherits:
-
Object
- Object
- EvalRuby::Dataset
- Includes:
- Enumerable
- Defined in:
- lib/eval_ruby/dataset.rb
Overview
Collection of evaluation samples with import/export support. Supports CSV, JSON, and programmatic construction.
Instance Attribute Summary collapse
-
#name ⇒ String
readonly
Dataset name.
-
#samples ⇒ Array<Hash>
readonly
Sample entries.
Class Method Summary collapse
-
.from_csv(path) ⇒ Dataset
Loads a dataset from a CSV file.
-
.from_json(path) ⇒ Dataset
Loads a dataset from a JSON file.
-
.generate(documents:, questions_per_doc: 5, llm: :openai) ⇒ Dataset
Generates a dataset from documents using an LLM.
Instance Method Summary collapse
-
#[](index) ⇒ Hash
Sample at index.
-
#add(question:, ground_truth: nil, relevant_contexts: [], answer: nil, context: []) ⇒ self
Adds a sample to the dataset.
- #each {|Hash| ... } ⇒ Object
-
#initialize(name = "default") ⇒ Dataset
constructor
A new instance of Dataset.
-
#size ⇒ Integer
Number of samples.
-
#to_csv(path) ⇒ void
Exports dataset to CSV.
-
#to_json(path) ⇒ void
Exports dataset to JSON.
Constructor Details
#initialize(name = "default") ⇒ Dataset
Returns a new instance of Dataset.
24 25 26 27 |
# File 'lib/eval_ruby/dataset.rb', line 24 def initialize(name = "default") @name = name @samples = [] end |
Instance Attribute Details
#name ⇒ String (readonly)
Returns dataset name.
18 19 20 |
# File 'lib/eval_ruby/dataset.rb', line 18 def name @name end |
#samples ⇒ Array<Hash> (readonly)
Returns sample entries.
21 22 23 |
# File 'lib/eval_ruby/dataset.rb', line 21 def samples @samples end |
Class Method Details
.from_csv(path) ⇒ Dataset
Loads a dataset from a CSV file.
67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/eval_ruby/dataset.rb', line 67 def self.from_csv(path) dataset = new(File.basename(path, ".*")) CSV.foreach(path, headers: true) do |row| dataset.add( question: row["question"], answer: row["answer"], context: parse_array_field(row["context"]), ground_truth: row["ground_truth"] ) end dataset end |
.from_json(path) ⇒ Dataset
Loads a dataset from a JSON file.
84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
# File 'lib/eval_ruby/dataset.rb', line 84 def self.from_json(path) dataset = new(File.basename(path, ".*")) data = JSON.parse(File.read(path)) samples = data.is_a?(Array) ? data : data["samples"] || data["data"] || [] samples.each do |sample| dataset.add( question: sample["question"], answer: sample["answer"], context: Array(sample["context"]), ground_truth: sample["ground_truth"] ) end dataset end |
.generate(documents:, questions_per_doc: 5, llm: :openai) ⇒ Dataset
Generates a dataset from documents using an LLM.
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
# File 'lib/eval_ruby/dataset.rb', line 131 def self.generate(documents:, questions_per_doc: 5, llm: :openai) config = EvalRuby.configuration.dup config.judge_llm = llm judge = case llm when :openai then Judges::OpenAI.new(config) when :anthropic then Judges::Anthropic.new(config) else raise Error, "Unknown LLM: #{llm}" end dataset = new("generated") documents.each do |doc_path| content = File.read(doc_path) prompt = <<~PROMPT Given the following document, generate #{questions_per_doc} question-answer pairs that can be answered using the document content. Document: #{content} Respond in JSON: {"pairs": [{"question": "...", "answer": "...", "context": "relevant excerpt"}]} PROMPT result = judge.call(prompt) next unless result.is_a?(Hash) && result.key?("pairs") result["pairs"].each do |pair| dataset.add( question: pair["question"], answer: pair["answer"], context: [pair["context"] || content], ground_truth: pair["answer"] ) end end dataset end |
Instance Method Details
#[](index) ⇒ Hash
Returns sample at index.
59 60 61 |
# File 'lib/eval_ruby/dataset.rb', line 59 def [](index) @samples[index] end |
#add(question:, ground_truth: nil, relevant_contexts: [], answer: nil, context: []) ⇒ self
Adds a sample to the dataset.
37 38 39 40 41 42 43 44 45 |
# File 'lib/eval_ruby/dataset.rb', line 37 def add(question:, ground_truth: nil, relevant_contexts: [], answer: nil, context: []) @samples << { question: question, answer: answer, context: context.empty? ? relevant_contexts : context, ground_truth: ground_truth } self end |
#each {|Hash| ... } ⇒ Object
48 49 50 |
# File 'lib/eval_ruby/dataset.rb', line 48 def each(&block) @samples.each(&block) end |
#size ⇒ Integer
Returns number of samples.
53 54 55 |
# File 'lib/eval_ruby/dataset.rb', line 53 def size @samples.size end |
#to_csv(path) ⇒ void
This method returns an undefined value.
Exports dataset to CSV.
103 104 105 106 107 108 109 110 111 112 113 114 115 |
# File 'lib/eval_ruby/dataset.rb', line 103 def to_csv(path) CSV.open(path, "w") do |csv| csv << %w[question answer context ground_truth] @samples.each do |sample| csv << [ sample[:question], sample[:answer], JSON.generate(sample[:context]), sample[:ground_truth] ] end end end |
#to_json(path) ⇒ void
This method returns an undefined value.
Exports dataset to JSON.
121 122 123 |
# File 'lib/eval_ruby/dataset.rb', line 121 def to_json(path) File.write(path, JSON.pretty_generate({name: @name, samples: @samples})) end |