Class: Treat::Workers::Formatters::Readers::Image
- Inherits:
-
Object
- Object
- Treat::Workers::Formatters::Readers::Image
- Defined in:
- lib/treat/workers/formatters/readers/image.rb
Overview
This class is a wrapper for the Google Ocropus optical character recognition (OCR) engine.
“OCRopus(tm) is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi- lingual capabilities.”
Original paper: Google Ocropus Engine: Breuel, Thomas M. The Ocropus Open Source OCR System. DFKI and U. Kaiserslautern, Germany.
Class Method Summary collapse
-
.create_temp_dir(&block) ⇒ Object
Create a dir that gets deleted after execution of the block.
-
.read(document, options = {}) ⇒ Object
Read a file using the Google Ocropus reader.
Class Method Details
.create_temp_dir(&block) ⇒ Object
Create a dir that gets deleted after execution of the block.
42 43 44 45 46 47 48 49 50 51 52 |
# File 'lib/treat/workers/formatters/readers/image.rb', line 42 def self.create_temp_dir(&block) if not FileTest.directory?(Treat.paths.tmp) FileUtils.mkdir(Treat.paths.tmp) end dname = Treat.paths.tmp + "#{Random.rand(10000000).to_s}" Dir.mkdir(dname) block.call(dname) ensure FileUtils.rm_rf(dname) end |
.read(document, options = {}) ⇒ Object
Read a file using the Google Ocropus reader.
Options:
-
(Boolean) :silent => whether to silence Ocropus.
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
# File 'lib/treat/workers/formatters/readers/image.rb', line 20 def self.read(document, = {}) read = lambda do |doc| self.create_temp_dir do |tmp| `ocropus-nlbin -o #{tmp}/out #{doc.file}` `ocropus-gpageseg #{tmp}/out/????.bin.png --minscale 2` `ocropus-rpred #{tmp}/out/????/??????.bin.png` `ocropus-hocr #{tmp}/out/????.bin.png -o #{tmp}/book.html` doc.set :file, "#{tmp}/book.html" doc.set :format, :html doc = doc.read(:html) end end Treat.core.verbosity.silence ? silence_stdout { read.call(document) } : read.call(document) document end |