Module: Tahweel

Defined in:
lib/tahweel.rb,
lib/tahweel/ocr.rb,
lib/tahweel/writer.rb,
lib/tahweel/version.rb,
lib/tahweel/converter.rb,
lib/tahweel/authorizer.rb,
lib/tahweel/cli/options.rb,
lib/tahweel/writers/txt.rb,
lib/tahweel/pdf_splitter.rb,
lib/tahweel/writers/docx.rb,
lib/tahweel/writers/json.rb,
lib/tahweel/cli/file_collector.rb,
lib/tahweel/cli/file_processor.rb,
lib/tahweel/cli/progress_renderer.rb,
lib/tahweel/processors/google_drive.rb

Overview

rubocop:disable Style/Documentation

Defined Under Namespace

Modules: CLI, Processors, Writers Classes: Authorizer, Converter, Error, Ocr, PdfSplitter, Writer

Constant Summary collapse

VERSION =
"0.1.1"

Class Method Summary collapse

Class Method Details

.convert(pdf_path, dpi: PdfSplitter::DEFAULT_DPI, processor: :google_drive, concurrency: Converter::DEFAULT_CONCURRENCY) ⇒ Array<String>

Converts a PDF file to text by splitting it into images and running OCR on each page.



23
24
25
26
27
28
29
# File 'lib/tahweel.rb', line 23

def self.convert(
  pdf_path,
  dpi: PdfSplitter::DEFAULT_DPI,
  processor: :google_drive,
  concurrency: Converter::DEFAULT_CONCURRENCY,
  &
) = Converter.convert(pdf_path, dpi:, processor:, concurrency:, &)

.extract(image_path, processor: :google_drive) ⇒ String

Extracts text from an image file using the specified OCR processor.



36
# File 'lib/tahweel.rb', line 36

def self.extract(image_path, processor: :google_drive) = Ocr.extract(image_path, processor:)