Module: Tahweel
- Defined in:
- lib/tahweel.rb,
lib/tahweel/ocr.rb,
lib/tahweel/writer.rb,
lib/tahweel/version.rb,
lib/tahweel/converter.rb,
lib/tahweel/authorizer.rb,
lib/tahweel/cli/options.rb,
lib/tahweel/writers/txt.rb,
lib/tahweel/pdf_splitter.rb,
lib/tahweel/writers/docx.rb,
lib/tahweel/writers/json.rb,
lib/tahweel/cli/file_collector.rb,
lib/tahweel/cli/file_processor.rb,
lib/tahweel/cli/progress_renderer.rb,
lib/tahweel/processors/google_drive.rb
Overview
rubocop:disable Style/Documentation
Defined Under Namespace
Modules: CLI, Processors, Writers Classes: Authorizer, Converter, Error, Ocr, PdfSplitter, Writer
Constant Summary collapse
- VERSION =
"0.1.1"
Class Method Summary collapse
-
.convert(pdf_path, dpi: PdfSplitter::DEFAULT_DPI, processor: :google_drive, concurrency: Converter::DEFAULT_CONCURRENCY) ⇒ Array<String>
Converts a PDF file to text by splitting it into images and running OCR on each page.
-
.extract(image_path, processor: :google_drive) ⇒ String
Extracts text from an image file using the specified OCR processor.
Class Method Details
.convert(pdf_path, dpi: PdfSplitter::DEFAULT_DPI, processor: :google_drive, concurrency: Converter::DEFAULT_CONCURRENCY) ⇒ Array<String>
Converts a PDF file to text by splitting it into images and running OCR on each page.
23 24 25 26 27 28 29 |
# File 'lib/tahweel.rb', line 23 def self.convert( pdf_path, dpi: PdfSplitter::DEFAULT_DPI, processor: :google_drive, concurrency: Converter::DEFAULT_CONCURRENCY, & ) = Converter.convert(pdf_path, dpi:, processor:, concurrency:, &) |
.extract(image_path, processor: :google_drive) ⇒ String
Extracts text from an image file using the specified OCR processor.
36 |
# File 'lib/tahweel.rb', line 36 def self.extract(image_path, processor: :google_drive) = Ocr.extract(image_path, processor:) |