Class: Tahweel::Writers::Docx

Inherits:
Object
  • Object
show all
Defined in:
lib/tahweel/writers/docx.rb

Overview

Writer class for outputting text to a .docx file.

Instance Method Summary collapse

Instance Method Details

#extensionString

Returns the file extension for this writer.

Returns:

  • (String)

    The file extension.



12
# File 'lib/tahweel/writers/docx.rb', line 12

def extension = "docx"

#write(texts, destination, options = {}) ⇒ void

This method returns an undefined value.

Writes the extracted texts to a file.

It applies several transformations to the text before writing:

  1. Normalizes line endings to ‘n`.

  2. Collapses consecutive identical whitespace characters.

  3. Compacts the text by merging short lines if the page is too long (> 40 lines).

  4. Determines text alignment (RTL/LTR) based on content.

Parameters:

  • texts (Array<String>)

    The extracted texts (one per page).

  • destination (String)

    The output file path.

  • options (Hash) (defaults to: {})

    Options for writing (unused for now).



26
27
28
29
30
31
32
33
34
35
36
37
# File 'lib/tahweel/writers/docx.rb', line 26

def write(texts, destination, options = {}) # rubocop:disable Lint/UnusedMethodArgument
  Caracal::Document.save(destination) do |docx|
    texts.each_with_index do |text, index|
      text = text.gsub(/(\r\n)+/, "\n").gsub(/(\s)\1+/, '\1').strip
      text = compact_shortest_lines(text) while expected_lines_in_page(text) > 40

      docx.p text, size: 20, align: alignment_for(text)

      docx.page if index < texts.size - 1
    end
  end
end