Module: HexaPDF::Layout::TextBox::SimpleTextSegmentation
- Defined in:
- lib/hexapdf/layout/text_box.rb
Overview
Implementation of a simple text segmentation algorithm.
The algorithm breaks TextFragment objects into objects wrapped by Box, Glue or Penalty items, and inserts additional Penalty items when needed:
-
Any valid Unicode newline separator inserts a Penalty object describing a mandatory break.
-
Spaces and tabulators are wrapped by Glue objects, allowing breaks.
-
Non-breaking spaces are wrapped into Penalty objects that prohibit line breaking.
-
Hyphens are attached to the preceeding text fragment (or are a standalone text fragment) and followed by a Penalty object to allow a break.
-
If a soft-hyphens is encountered, a hyphen wrapped by a Penalty object is inserted to allow a break.
-
If a zero-width-space is encountered, a Penalty object is inserted to allow a break.
Constant Summary collapse
- BREAK_RE =
Breaks are detected at: space, tab, zero-width-space, non-breaking space, hyphen, soft-hypen and any valid Unicode newline separator
/[ \u{A}-\u{D}\u{85}\u{2028}\u{2029}\t\u{200B}\u{00AD}\u{00A0}-]/
Class Method Summary collapse
-
.call(items) ⇒ Object
Breaks the items (an array of InlineBox and TextFragment objects) into atomic pieces wrapped by Box, Glue or Penalty items, and returns those as an array.
Class Method Details
.call(items) ⇒ Object
Breaks the items (an array of InlineBox and TextFragment objects) into atomic pieces wrapped by Box, Glue or Penalty items, and returns those as an array.
190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 |
# File 'lib/hexapdf/layout/text_box.rb', line 190 def self.call(items) result = [] glues = {} items.each do |item| if item.kind_of?(InlineBox) result << Box.new(item) else i = 0 while i < item.items.size # Collect characters and kerning values until break character is encountered box_items = [] while (glyph = item.items[i]) && (glyph.kind_of?(Numeric) || !BREAK_RE.match?(glyph.str)) box_items << glyph i += 1 end # A hyphen belongs to the text fragment box_items << glyph if glyph && !glyph.kind_of?(Numeric) && glyph.str == '-'.freeze unless box_items.empty? result << Box.new(TextFragment.new(items: box_items.freeze, style: item.style)) end if glyph case glyph.str when ' ' glues[item.style] ||= Glue.new(TextFragment.new(items: [glyph].freeze, style: item.style)) result << glues[item.style] when "\n", "\v", "\f", "\u{85}", "\u{2028}", "\u{2029}" result << Penalty::MandatoryBreak when "\r" if item.items[i + 1]&.kind_of?(Numeric) || item.items[i + 1].str != "\n" result << Penalty::MandatoryBreak end when '-' result << Penalty::Standard when "\t" spaces = [item.style.font.decode_utf8(" ").first] * 8 result << Glue.new(TextFragment.new(items: spaces.freeze, style: item.style)) when "\u{00AD}" hyphen = item.style.font.decode_utf8("-").first frag = TextFragment.new(items: [hyphen].freeze, style: item.style) result << Penalty.new(Penalty::Standard.penalty, frag.width, item: frag) when "\u{00A0}" space = item.style.font.decode_utf8(" ").first frag = TextFragment.new(items: [space].freeze, style: item.style) result << Penalty.new(Penalty::ProhibitedBreak.penalty, frag.width, item: frag) when "\u{200B}" result << Penalty.new(0) end end i += 1 end end end result end |