Class: Tahweel::PdfSplitter

Inherits:
Object
  • Object
show all
Defined in:
lib/tahweel/pdf_splitter.rb

Overview

Handles the logic for splitting a PDF file into individual image pages. Uses the libvips library for high-performance image processing.

Constant Summary collapse

DEFAULT_DPI =

Default DPI used when converting PDF pages to images. 150 DPI is a good balance between quality and file size for general documents.

150

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(pdf_path, dpi: DEFAULT_DPI) ⇒ PdfSplitter

Initializes a new PdfSplitter instance.



35
36
37
38
39
# File 'lib/tahweel/pdf_splitter.rb', line 35

def initialize(pdf_path, dpi: DEFAULT_DPI)
  @pdf_path = pdf_path
  @dpi = dpi
  @image_paths = []
end

Class Method Details

.split(pdf_path, dpi: DEFAULT_DPI) {|Hash| ... } ⇒ Hash

Convenience class method to initialize and execute the split operation in one go.

}

Yields:

  • (Hash)

    Progress info: { stage: :splitting, current_page: Integer, percentage: Float, remaining_pages: Integer



29
# File 'lib/tahweel/pdf_splitter.rb', line 29

def self.split(pdf_path, dpi: DEFAULT_DPI, &) = new(pdf_path, dpi:).split(&)

Instance Method Details

#split {|Hash| ... } ⇒ Hash

Executes the PDF splitting process.

This method performs the following steps:

  1. Checks if libvips is installed (skips on Windows).

  2. Validates the existence of the source PDF file.

  3. Creates a unique temporary directory for output.

  4. Iterates through each page of the PDF and converts it to a PNG image.

}

Yields:

  • (Hash)

    Progress info: { stage: :splitting, current_page: Integer, percentage: Float, remaining_pages: Integer

Raises:

  • (RuntimeError)

    If the PDF file is not found or libvips is missing.

  • (Vips::Error)

    If the underlying VIPS library encounters an error during processing.



61
62
63
64
65
66
67
# File 'lib/tahweel/pdf_splitter.rb', line 61

def split(&)
  check_libvips_installed!
  validate_file_exists!
  setup_output_directory
  process_pages(&)
  result
end