Class: Asimov::Utils::ClassificationsFileValidator

Inherits:
JsonlValidator show all
Defined in:
lib/asimov/utils/classifications_file_validator.rb

Overview

Validates that a file is in the “classifications” format used by OpenAI. The file is a JSONL file, with “text” and “label” keys for each line that have string values and an optional “metadata” key that can have any value. No other keys are permitted.

Instance Method Summary collapse

Methods inherited from JsonlValidator

#validate

Instance Method Details

#classification?(parsed) ⇒ Boolean

Returns:

  • (Boolean)


26
27
28
29
30
31
32
33
34
35
# File 'lib/asimov/utils/classifications_file_validator.rb', line 26

def classification?(parsed)
  return false unless parsed.is_a?(Hash)
  return false unless includes_required_key_value?("text", parsed)
  return false unless includes_required_key_value?("label", parsed)

  keys = parsed.keys
  return false unless keys.size <= 3

  keys.size == 2 ? true : keys.include?("metadata")
end

#includes_required_key_value?(key, parsed) ⇒ Boolean

Returns:

  • (Boolean)


37
38
39
# File 'lib/asimov/utils/classifications_file_validator.rb', line 37

def includes_required_key_value?(key, parsed)
  parsed[key]&.is_a?(String)
end

#validate_classification(parsed, idx) ⇒ Object



18
19
20
21
22
23
24
# File 'lib/asimov/utils/classifications_file_validator.rb', line 18

def validate_classification(parsed, idx)
  return if classification?(parsed)

  raise InvalidClassificationError,
        "Expected file to have JSONL format with text/label and (optional) metadata keys. " \
        "Invalid format on line #{idx + 1}."
end

#validate_line(line, idx) ⇒ Object



13
14
15
16
# File 'lib/asimov/utils/classifications_file_validator.rb', line 13

def validate_line(line, idx)
  parsed = JSON.parse(line)
  validate_classification(parsed, idx)
end