Class: Crawdad::PrawnTokenizer
- Inherits:
-
Object
- Object
- Crawdad::PrawnTokenizer
- Includes:
- Tokens
- Defined in:
- lib/crawdad/prawn_tokenizer.rb
Overview
Ambassador to Prawn. Turns a paragraph into wrappable items.
Constant Summary
Constants included from Tokens
Instance Method Summary collapse
-
#initialize(pdf) ⇒ PrawnTokenizer
constructor
Sets up a tokenizer for the given document (
pdf
). -
#paragraph(text, options = {}) ⇒ Object
Tokenize the given paragraph of text into a stream of items (boxes, glue, and penalties).
Methods included from Tokens
#box, #box_content, #glue, #glue_shrink, #glue_stretch, #penalty, #penalty_flagged?, #penalty_penalty, #token_type, #token_width
Constructor Details
#initialize(pdf) ⇒ PrawnTokenizer
Sets up a tokenizer for the given document (pdf
).
21 22 23 |
# File 'lib/crawdad/prawn_tokenizer.rb', line 21 def initialize(pdf) @pdf = pdf end |
Instance Method Details
#paragraph(text, options = {}) ⇒ Object
Tokenize the given paragraph of text into a stream of items (boxes, glue, and penalties).
options
:
hyphenation
-
If provided, allow the given text to be hyphenated as needed to best fit the available space. Requires the text-hyphen gem. Allowable values: an ISO 639 language code (like ‘pt’), or
true
(synonym for ‘en_us’). indent
-
If specified, indent the first line of the paragraph by the given number of PDF points.
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
# File 'lib/crawdad/prawn_tokenizer.rb', line 38 def paragraph(text, ={}) @align = [:align] || :justify hyphenator = if [:hyphenation] # Box-glue-penalty model does not easily permit optional hyphenation # with the construction we use for centered text. if @align == :center raise ArgumentError, "Hyphenation is not supported with centered text" end begin gem 'text-hyphen' require 'text/hyphen' rescue LoadError raise LoadError, ":hyphenation option requires the text-hyphen gem" end language = ((lang = [:hyphenation]) == true) ? 'en_us' : lang @hyphenators ||= {} @hyphenators[language] ||= Text::Hyphen.new(:language => language) end stream = starting_tokens([:indent]) # Break paragraph on whitespace. # TODO: how should "battle-\nfield" be tokenized? words = text.strip.split(/\s+/) words.each_with_index do |word, i| w = StringScanner.new(word) # For hyphenated words, follow each hyphen by a zero-width flagged # penalty. while seg = w.scan(/[^-]+-/) # "night-time" --> "<<night->>time" stream += word_segment(seg, hyphenator) end stream += word_segment(w.rest, hyphenator) unless i == words.length - 1 stream += interword_tokens end end # Add needed tokens to finish off the paragraph. stream += finishing_tokens stream end |