Class: Tahweel::CLI::FileProcessor

Inherits:
Object
  • Object
show all
Defined in:
lib/tahweel/cli/file_processor.rb

Overview

Processes a single file by orchestrating conversion/extraction and writing the output.

This class acts as the bridge between the CLI inputs and the core library logic. It determines the file type (PDF or Image), calls the appropriate processing method, and directs the results to the Writer.

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(file_path, options) ⇒ FileProcessor

Initializes a new FileProcessor.

Parameters:

  • file_path (String)

    The path to the input file.

  • options (Hash)

    Configuration options (see process).



38
39
40
41
# File 'lib/tahweel/cli/file_processor.rb', line 38

def initialize(file_path, options)
  @file_path = file_path
  @options = options
end

Class Method Details

.process(file_path, options) {|Hash| ... } ⇒ void

This method returns an undefined value.

Processes the given file according to the provided options.

}

Parameters:

  • file_path (String)

    The path to the input file.

  • options (Hash)

    Configuration options.

  • &block (Proc)

    A block that will be yielded with progress info.

Options Hash (options):

  • :output (String)

    The directory to save output files (defaults to current directory).

  • :dpi (Integer)

    DPI for PDF conversion (defaults to 150).

  • :processor (Symbol)

    The OCR processor to use (e.g., :google_drive).

  • :page_concurrency (Integer)

    Max concurrent operations.

  • :formats (Array<Symbol>)

    Output formats (e.g., [:txt, :docx]).

  • :page_separator (String)

    Separator string for TXT output.

  • :base_input_path (String)

    The base path used to determine relative output structure.

Yields:

  • (Hash)

    Progress info: { stage: :splitting or :ocr, current_page: Integer, percentage: Float, remaining_pages: Integer



32
# File 'lib/tahweel/cli/file_processor.rb', line 32

def self.process(file_path, options, &) = new(file_path, options).process(&)

Instance Method Details

#process {|Hash| ... } ⇒ void

This method returns an undefined value.

Executes the processing logic.

  1. Ensures the output directory exists.

  2. Checks if output files already exist to avoid redundant processing.

  3. Detects if the input is a PDF or an image.

  4. Runs the appropriate conversion/extraction pipeline.

  5. Writes the results to the configured formats.

}

Parameters:

  • &block (Proc)

    A block that will be yielded with progress info.

Yields:

  • (Hash)

    Progress info: { stage: :splitting or :ocr, current_page: Integer, percentage: Float, remaining_pages: Integer



59
60
61
62
63
64
65
# File 'lib/tahweel/cli/file_processor.rb', line 59

def process(&)
  ensure_output_directory_exists

  return if all_outputs_exist?

  pdf? ? process_pdf(&) : process_image
end