Class: OORB
- Inherits:
-
Object
- Object
- OORB
- Defined in:
- lib/oorb.rb
Overview
OCR Optimized Regex Builder
Constant Summary collapse
- LETTERS =
Letters that regularly are mistaken in OCR and their common replacements
{'a' => %w(9), 'b' => %w(h), 'c' => %w(e f d o 6), 'd' => %w(3 0 o 7), 'e' => %w(6 c d f 4 3), 'f' => %w(c s p), 'g' => %w(9 8), 'h' => %w(b), 'i' => %w(l 1), 'j' => %w(y), 'l' => %w(1 i t 7), 'n' => %w(r), 'o' => %w(c 6 0 3 d), 'p' => %w(fr), 'r' => %w(np), 's' => %w(f l j i 3 8 5), 't' => %w(i l 4 7), 'u' => %w(v), 'v' => %w(yu), 'y' => %w(v j 7), 'z' => %w(2) }
- SECTIONS =
Letters that are commonly mistakenly split up and their replacements
{'m' => '[mnr][nr]?', 'w' => '[wvu][vu]?' }
Instance Method Summary collapse
-
#build_collection(character) ⇒ String
Builds a group match from an input letter.
-
#build_regex(input) ⇒ String
Builds an OCR optimized regular expression from a string.
-
#build_section(character) ⇒ String
Builds a section from an input letter.
-
#combine_whitespace(string) ⇒ String
Collapses mutliple consecutive whitespace characters into a single whitespace character.
-
#escape(character) ⇒ String
Escapes a single-character string and makes whitespace characters optional.
-
#run ⇒ Object
Runs the application from the command line.
Instance Method Details
#build_collection(character) ⇒ String
Builds a group match from an input letter.
77 78 79 80 81 82 83 84 |
# File 'lib/oorb.rb', line 77 def build_collection(character) unless LETTERS[character] raise ArgumentError, "Valid arguments are a single character from #{LETTERS.keys.join(", ")}." end LETTERS[character].each { |x| character << x } "[#{character}]" end |
#build_regex(input) ⇒ String
Builds an OCR optimized regular expression from a string
52 53 54 55 56 57 58 59 60 61 62 |
# File 'lib/oorb.rb', line 52 def build_regex(input) input.downcase.chars.map do |char| if LETTERS.has_key?(char) build_collection(char) elsif SECTIONS.has_key?(char) build_section(char) else escape(char) end end.join end |
#build_section(character) ⇒ String
Builds a section from an input letter.
91 92 93 94 95 96 97 |
# File 'lib/oorb.rb', line 91 def build_section(character) unless SECTIONS[character] raise ArgumentError, "Valid arguments are a single character from #{SECTIONS.keys.join(", ")}." end SECTIONS[character] end |
#combine_whitespace(string) ⇒ String
Collapses mutliple consecutive whitespace characters into a single whitespace character
68 69 70 |
# File 'lib/oorb.rb', line 68 def combine_whitespace(string) string.gsub(/\s+/, "\s") end |
#escape(character) ⇒ String
Escapes a single-character string and makes whitespace characters optional
104 105 106 107 108 109 |
# File 'lib/oorb.rb', line 104 def escape(character) if character.length > 1 raise ArgumentError, "Argument must be a single character string" end character == "\s" ? "\\s?" : Regexp.escape(character) end |
#run ⇒ Object
Runs the application from the command line
40 41 42 43 44 45 46 |
# File 'lib/oorb.rb', line 40 def run puts "Waiting for a statement." user_input = gets.chomp combined = combine_whitespace(user_input) puts build_regex(combined) run end |