Class: OORB

Inherits:
Object
  • Object
show all
Defined in:
lib/oorb.rb

Overview

OCR Optimized Regex Builder

Constant Summary collapse

LETTERS =

Letters that regularly are mistaken in OCR and their common replacements

{'a' => %w(9),
           'b' => %w(h),
           'c' => %w(e f d o 6),
           'd' => %w(3 0 o 7),
           'e' => %w(6 c d f 4 3),
           'f' => %w(c s p),
           'g' => %w(9 8),
           'h' => %w(b),
           'i' => %w(l 1),
           'j' => %w(y),
           'l' => %w(1 i t 7),
           'n' => %w(r),
           'o' => %w(c 6 0 3 d),
           'p' => %w(fr),
           'r' => %w(np),
           's' => %w(f l j i 3 8 5),
           't' => %w(i l 4 7),
           'u' => %w(v),
           'v' => %w(yu),
           'y' => %w(v j 7),
           'z' => %w(2)
}
SECTIONS =

Letters that are commonly mistakenly split up and their replacements

{'m' => '[mnr][nr]?',
            'w' => '[wvu][vu]?'
}

Instance Method Summary collapse

Instance Method Details

#build_collection(character) ⇒ String

Builds a group match from an input letter.

Parameters:

  • character (String)

    made of a single character

Returns:

  • (String)

    collection of commonly mis-ocr’d characters bounded by square brackets

Raises:

  • (ArgumentError)

    if the argument isn’t a single character string from OORB::LETTERS



77
78
79
80
81
82
83
84
# File 'lib/oorb.rb', line 77

def build_collection(character)
  unless LETTERS[character]
    raise ArgumentError,
      "Valid arguments are a single character from #{LETTERS.keys.join(", ")}."
  end
  LETTERS[character].each { |x| character << x }
  "[#{character}]"
end

#build_regex(input) ⇒ String

Builds an OCR optimized regular expression from a string

Parameters:

  • input (String)

    to be parsed

Returns:

  • (String)

    formatted as a valid regular expression optimized for capturing OCR mistakes



52
53
54
55
56
57
58
59
60
61
62
# File 'lib/oorb.rb', line 52

def build_regex(input)
  input.downcase.chars.map do |char|
    if LETTERS.has_key?(char)
      build_collection(char)
    elsif SECTIONS.has_key?(char)
      build_section(char)
    else
      escape(char)
    end
  end.join
end

#build_section(character) ⇒ String

Builds a section from an input letter.

Parameters:

  • character (String)

    made of a single character

Returns:

  • (String)

    section of commonly split characters with optional second character

Raises:

  • (ArgumentError)

    if the argument isn’t a single character string from OORB::SECTIONS



91
92
93
94
95
96
97
# File 'lib/oorb.rb', line 91

def build_section(character)
  unless SECTIONS[character]
    raise ArgumentError,
      "Valid arguments are a single character from #{SECTIONS.keys.join(", ")}."
  end
  SECTIONS[character]
end

#combine_whitespace(string) ⇒ String

Collapses mutliple consecutive whitespace characters into a single whitespace character

Parameters:

  • string (String)

    of any length

Returns:

  • (String)

    where consecutive whitespace characters have been collapsed



68
69
70
# File 'lib/oorb.rb', line 68

def combine_whitespace(string)
  string.gsub(/\s+/, "\s")
end

#escape(character) ⇒ String

Escapes a single-character string and makes whitespace characters optional

Parameters:

  • character (String)

    made of a single character

Returns:

  • (String)

    escaped character with whitespace charactions made optional

Raises:

  • (ArgumentError)

    if the argument isn’t a single character string



104
105
106
107
108
109
# File 'lib/oorb.rb', line 104

def escape(character)
  if character.length > 1
    raise ArgumentError, "Argument must be a single character string"
  end
  character == "\s" ? "\\s?" : Regexp.escape(character)
end

#runObject

Runs the application from the command line



40
41
42
43
44
45
46
# File 'lib/oorb.rb', line 40

def run
  puts "Waiting for a statement."
  user_input = gets.chomp
  combined = combine_whitespace(user_input)
  puts build_regex(combined)
  run
end