Class: Asimov::Utils::TrainingFileValidator

Inherits:
JsonlValidator show all
Defined in:
lib/asimov/utils/training_file_validator.rb

Overview

Validates that a file is in the “fine-tune” format used by OpenAI. The file is a JSONL file, with “prompt” and “completion” keys for each line that have string values. No other keys are permitted.

Instance Method Summary collapse

Methods inherited from JsonlValidator

#validate

Instance Method Details

#training_example?(parsed) ⇒ Boolean

Returns:

  • (Boolean)


25
26
27
28
29
30
31
32
33
34
35
36
# File 'lib/asimov/utils/training_file_validator.rb', line 25

def training_example?(parsed)
  return false unless parsed.is_a?(Hash)

  keys = parsed.keys
  return false unless keys.size == 2
  return false unless keys.include?("prompt")
  return false unless keys.include?("completion")
  return false unless parsed["prompt"].is_a?(String)
  return false unless parsed["completion"].is_a?(String)

  true
end

#validate_line(line, idx) ⇒ Object



12
13
14
15
# File 'lib/asimov/utils/training_file_validator.rb', line 12

def validate_line(line, idx)
  parsed = JSON.parse(line)
  validate_training_example(parsed, idx)
end

#validate_training_example(parsed, idx) ⇒ Object



17
18
19
20
21
22
23
# File 'lib/asimov/utils/training_file_validator.rb', line 17

def validate_training_example(parsed, idx)
  return if training_example?(parsed)

  raise InvalidTrainingExampleError,
        "Expected file to have JSONL format with prompt/completion keys. " \
        "Invalid format on line #{idx + 1}."
end