Class: Leva::DatasetConverter

Inherits:
Object
  • Object
show all
Defined in:
app/services/leva/dataset_converter.rb

Overview

Converts Leva datasets to DSPy example format.

This service transforms DatasetRecord objects into DSPy::Example objects suitable for use with DSPy optimizers and predictors.

Examples:

Convert a dataset to DSPy examples

converter = Leva::DatasetConverter.new(dataset)
examples = converter.to_dspy_examples

Split dataset for training

converter = Leva::DatasetConverter.new(dataset)
splits = converter.split(train_ratio: 0.6, val_ratio: 0.2)
# => { train: [...], val: [...], test: [...] }

Instance Method Summary collapse

Constructor Details

#initialize(dataset) ⇒ DatasetConverter

Returns a new instance of DatasetConverter.

Parameters:



19
20
21
# File 'app/services/leva/dataset_converter.rb', line 19

def initialize(dataset)
  @dataset = dataset
end

Instance Method Details

#split(train_ratio: 0.6, val_ratio: 0.2, seed: nil) ⇒ Hash

Splits the dataset into train, validation, and test sets.

Parameters:

  • train_ratio (Float) (defaults to: 0.6)

    Proportion of data for training (default: 0.6)

  • val_ratio (Float) (defaults to: 0.2)

    Proportion of data for validation (default: 0.2)

  • seed (Integer, nil) (defaults to: nil)

    Random seed for reproducibility

Returns:

  • (Hash)

    Hash with :train, :val, and :test arrays



44
45
46
47
48
49
50
51
52
53
54
55
56
# File 'app/services/leva/dataset_converter.rb', line 44

def split(train_ratio: 0.6, val_ratio: 0.2, seed: nil)
  examples = to_dspy_examples
  examples = seed ? examples.shuffle(random: Random.new(seed)) : examples.shuffle

  train_size = (examples.size * train_ratio).to_i
  val_size = (examples.size * val_ratio).to_i

  {
    train: examples[0...train_size],
    val: examples[train_size...(train_size + val_size)],
    test: examples[(train_size + val_size)..]
  }
end

#to_dspy_examplesArray<Hash>

Converts all dataset records to DSPy example format. Uses to_dspy_context if available, otherwise falls back to to_llm_context.

Returns:

  • (Array<Hash>)

    Array of example hashes with :input and :expected keys



27
28
29
30
31
32
33
34
35
36
# File 'app/services/leva/dataset_converter.rb', line 27

def to_dspy_examples
  @dataset.dataset_records.includes(:recordable).map do |record|
    next unless record.recordable

    {
      input: sanitize_context(context_for(record.recordable)),
      expected: { output: record.recordable.ground_truth.to_s }
    }
  end.compact
end

#valid_record_countInteger

Returns the count of valid records in the dataset.

Returns:

  • (Integer)

    Number of records with valid recordable objects



61
62
63
# File 'app/services/leva/dataset_converter.rb', line 61

def valid_record_count
  to_dspy_examples.size
end