Class: Tahweel::Converter
- Inherits:
-
Object
- Object
- Tahweel::Converter
- Defined in:
- lib/tahweel/converter.rb
Overview
Orchestrates the full conversion process:
-
Splits a PDF into images.
-
Performs OCR on each image concurrently.
-
Returns the aggregated text.
-
Cleans up temporary files.
Constant Summary collapse
- DEFAULT_CONCURRENCY =
Max concurrent OCR operations to avoid hitting API rate limits too hard.
12
Class Method Summary collapse
-
.convert(pdf_path, dpi: PdfSplitter::DEFAULT_DPI, processor: :google_drive, concurrency: DEFAULT_CONCURRENCY) {|Hash| ... } ⇒ Array<String>
Convenience method to convert a PDF file to text.
Instance Method Summary collapse
-
#convert {|Hash| ... } ⇒ Array<String>
Executes the conversion process.
-
#initialize(pdf_path, dpi: PdfSplitter::DEFAULT_DPI, processor: :google_drive, concurrency: DEFAULT_CONCURRENCY) ⇒ Converter
constructor
Initializes the Converter.
Constructor Details
#initialize(pdf_path, dpi: PdfSplitter::DEFAULT_DPI, processor: :google_drive, concurrency: DEFAULT_CONCURRENCY) ⇒ Converter
Initializes the Converter.
45 46 47 48 49 50 |
# File 'lib/tahweel/converter.rb', line 45 def initialize(pdf_path, dpi: PdfSplitter::DEFAULT_DPI, processor: :google_drive, concurrency: DEFAULT_CONCURRENCY) @pdf_path = pdf_path @dpi = dpi @processor_type = processor @concurrency = concurrency end |
Class Method Details
.convert(pdf_path, dpi: PdfSplitter::DEFAULT_DPI, processor: :google_drive, concurrency: DEFAULT_CONCURRENCY) {|Hash| ... } ⇒ Array<String>
Convenience method to convert a PDF file to text.
}
31 32 33 34 35 36 37 |
# File 'lib/tahweel/converter.rb', line 31 def self.convert( pdf_path, dpi: PdfSplitter::DEFAULT_DPI, processor: :google_drive, concurrency: DEFAULT_CONCURRENCY, & ) = new(pdf_path, dpi:, processor:, concurrency:).convert(&) |
Instance Method Details
#convert {|Hash| ... } ⇒ Array<String>
Executes the conversion process.
}
62 63 64 65 66 67 68 69 70 |
# File 'lib/tahweel/converter.rb', line 62 def convert(&) image_paths, temp_dir = PdfSplitter.split(@pdf_path, dpi: @dpi, &).values_at(:image_paths, :folder_path) begin process_images(image_paths, Ocr.new(processor: @processor_type), &) ensure FileUtils.rm_rf(temp_dir) end end |