Class: Crawdad::PrawnTokenizer

Inherits:
Object
  • Object
show all
Includes:
Tokens
Defined in:
lib/crawdad/prawn_tokenizer.rb

Overview

Ambassador to Prawn. Turns a paragraph into wrappable items.

Constant Summary

Constants included from Tokens

Tokens::Type

Instance Method Summary collapse

Methods included from Tokens

#box, #box_content, #glue, #glue_shrink, #glue_stretch, #penalty, #penalty_flagged?, #penalty_penalty, #token_type, #token_width

Constructor Details

#initialize(pdf) ⇒ PrawnTokenizer

Sets up a tokenizer for the given document (pdf).



21
22
23
# File 'lib/crawdad/prawn_tokenizer.rb', line 21

def initialize(pdf)
  @pdf = pdf
end

Instance Method Details

#paragraph(text, options = {}) ⇒ Object

Tokenize the given paragraph of text into a stream of items (boxes, glue, and penalties).

options:

hyphenation

If provided, allow the given text to be hyphenated as needed to best fit the available space. Requires the text-hyphen gem. Allowable values: an ISO 639 language code (like ‘pt’), or true (synonym for ‘en_us’).

indent

If specified, indent the first line of the paragraph by the given number of PDF points.



38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# File 'lib/crawdad/prawn_tokenizer.rb', line 38

def paragraph(text, options={})
  @align = options[:align] || :justify

  hyphenator = if options[:hyphenation]
    # Box-glue-penalty model does not easily permit optional hyphenation
    # with the construction we use for centered text.
    if @align == :center
      raise ArgumentError, "Hyphenation is not supported with centered text"
    end

    begin
      gem 'text-hyphen'
      require 'text/hyphen'
    rescue LoadError
      raise LoadError, ":hyphenation option requires the text-hyphen gem"
    end

    language = ((lang = options[:hyphenation]) == true) ? 'en_us' : lang
    @hyphenators ||= {}
    @hyphenators[language] ||= Text::Hyphen.new(:language => language)
  end

  stream = starting_tokens(options[:indent])

  # Break paragraph on whitespace.
  # TODO: how should "battle-\nfield" be tokenized?
  words = text.strip.split(/\s+/)
  
  words.each_with_index do |word, i|
    w = StringScanner.new(word)

    # For hyphenated words, follow each hyphen by a zero-width flagged
    # penalty.
    while seg = w.scan(/[^-]+-/) # "night-time" --> "<<night->>time"
      stream += word_segment(seg, hyphenator)
    end

    stream += word_segment(w.rest, hyphenator)
    
    unless i == words.length - 1
      stream += interword_tokens
    end
  end

  # Add needed tokens to finish off the paragraph.
  stream += finishing_tokens

  stream
end