Class: Text::Hyphen
- Inherits:
-
Object
- Object
- Text::Hyphen
- Defined in:
- lib/text/hyphen.rb
Overview
An object that knows how to perform hyphenation based on the TeX hyphenation algorithm with pattern files. Each object is constructed with a specific language’s hyphenation patterns.
Defined Under Namespace
Classes: Language
Constant Summary collapse
- DEBUG =
false
- VERSION =
'1.2'
- DEFAULT_MIN_LEFT =
2
- DEFAULT_MIN_RIGHT =
2
Instance Attribute Summary collapse
-
#iso_language ⇒ Object
readonly
Returns the language’s ISO 639 ID, e.g., “en_us” or “pt”.
-
#language ⇒ Object
The name of the language to be used in hyphenating words.
-
#left ⇒ Object
No fewer than this number of letters will show up to the left of the hyphen.
-
#right ⇒ Object
No fewer than this number of letters will show up to the right of the hyphen.
Class Method Summary collapse
-
.require_real_hyphenation_file(loader) ⇒ Object
Resolves a file for cleaner loading from a hyphenation loader file.
Instance Method Summary collapse
-
#clear_cache! ⇒ Object
Clears the per-instance hyphenation and visualization caches.
-
#hyphenate(word) ⇒ Object
Returns an array of character positions where a word can be hyphenated.
-
#hyphenate_to(word, size) ⇒ Object
This function will hyphenate a word so that the first point is at most
size
characters. -
#initialize(options = {}) {|_self| ... } ⇒ Hyphen
constructor
Creates a hyphenation object with the options requested.
-
#stats ⇒ Object
Returns a string describing the structure of the patterns for the language of this hyphenation object.
-
#visualise(word) ⇒ Object
(also: #visualize)
Returns a visualization of the hyphenation points.
Constructor Details
#initialize(options = {}) {|_self| ... } ⇒ Hyphen
Creates a hyphenation object with the options requested. The options available are:
- language
-
The language to perform hyphenation with. See #language and #iso_language.
- left
-
The minimum number of characters to the left of a hyphenation point. See #left.
- right
-
The minimum number of characters to the right of a hyphenation point. See #right.
The options can be provided either as hashed parameters or set as methods in an initialization block. The following initializations are all equivalent:
hyp = Text::Hyphenate.new(:language => 'en_us')
hyp = Text::Hyphenate.new(language: 'en_us') # under Ruby 1.9
hyp = Text::Hyphenate.new { |h| h.language = 'en_us' }
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
# File 'lib/text/hyphen.rb', line 75 def initialize( = {}) # :yields self: @iso_language = [:language] @left = [:left] @right = [:right] @language = nil @cache = {} @vcache = {} @hyphen = {} @begin_hyphen = {} @end_hyphen = {} @both_hyphen = {} @exception = {} @first_load = true yield self if block_given? @first_load = false load_language @left ||= DEFAULT_MIN_LEFT @right ||= DEFAULT_MIN_RIGHT end |
Instance Attribute Details
#iso_language ⇒ Object (readonly)
Returns the language’s ISO 639 ID, e.g., “en_us” or “pt”.
56 57 58 |
# File 'lib/text/hyphen.rb', line 56 def iso_language @iso_language end |
#language ⇒ Object
The name of the language to be used in hyphenating words. This will be a two or three character ISO 639 code, with the two character form being the canonical resource name. This will load the language hyphenation definitions from text/hyphen/language/<code> as a Ruby class. The resource ‘text/hyphen/language/en_us’ defines the language class Text::Hyphen::Language::EN_US. It also defines the secondary forms Text::Hyphen::Language::EN and Text::Hyphen::Language::ENG_US.
Minimal transformations will be performed on the language code provided, such that any dashes are converted to underscores (e.g., ‘en-us’ becomes ‘en_us’) and all characters are regularised. Resource names will be downcased and class names will be converted to uppercase (e.g., ‘Pt’ for the Portuguese language becomes ‘pt’ and ‘PT’, respectively).
The language may also be specified as an instance of Text::Hyphen::Language.
40 41 42 |
# File 'lib/text/hyphen.rb', line 40 def language @language end |
#left ⇒ Object
No fewer than this number of letters will show up to the left of the hyphen. The initial value for this will be specified by the language; setting this value will override the language’s defaults.
18 19 20 |
# File 'lib/text/hyphen.rb', line 18 def left @left end |
#right ⇒ Object
No fewer than this number of letters will show up to the right of the hyphen. This overrides the default specified in the language.
22 23 24 |
# File 'lib/text/hyphen.rb', line 22 def right @right end |
Class Method Details
.require_real_hyphenation_file(loader) ⇒ Object
Resolves a file for cleaner loading from a hyphenation loader file.
243 244 245 246 247 248 249 250 251 252 |
# File 'lib/text/hyphen.rb', line 243 def self.require_real_hyphenation_file(loader) # :nodoc: p = File.dirname(loader) f = File.basename(loader) v = if RUBY_VERSION < "1.9.1" "1.8" else "1.9" end require File.join(p, v, f) end |
Instance Method Details
#clear_cache! ⇒ Object
Clears the per-instance hyphenation and visualization caches.
166 167 168 169 |
# File 'lib/text/hyphen.rb', line 166 def clear_cache! @cache.clear @vcache.clear end |
#hyphenate(word) ⇒ Object
Returns an array of character positions where a word can be hyphenated.
hyp.hyphenate('representation') #=> [3, 5, 8 10]
Because hyphenation can be expensive, if the word has been hyphenated previously, it will be returned from a per-instance cache.
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
# File 'lib/text/hyphen.rb', line 106 def hyphenate(word) word = word.downcase $stderr.puts "Hyphenating #{word}" if DEBUG return @cache[word] if @cache.has_key?(word) res = @language.exceptions[word] return @cache[word] = make_result_list(res) if res letters = word.scan(@language.scan_re) $stderr.puts letters.inspect if DEBUG word_size = letters.size result = [0] * (word_size + 1) right_stop = word_size - @right updater = Proc.new do |hash, str, pos| if hash.has_key?(str) $stderr.print "#{pos}: #{str}: #{hash[str]}" if DEBUG hash[str].scan(@language.scan_re).each_with_index do |cc, ii| cc = cc.to_i result[ii + pos] = cc if cc > result[ii + pos] end $stderr.print ": #{result.inspect}\n" if DEBUG end end # Walk the word (0..right_stop).each do |pos| rest_length = word_size - pos (1..rest_length).each do |length| substr = letters[pos, length].join('') updater[@language.hyphen, substr, pos] updater[@language.start, substr, pos] if pos.zero? updater[@language.stop, substr, pos] if (length == rest_length) end end updater[@language.both, word, 0] if @language.both[word] (0..@left).each { |i| result[i] = 0 } ((-1 - @right)..(-1)).each { |i| result[i] = 0 } @cache[word] = make_result_list(result) end |
#hyphenate_to(word, size) ⇒ Object
This function will hyphenate a word so that the first point is at most size
characters.
173 174 175 176 177 178 179 180 |
# File 'lib/text/hyphen.rb', line 173 def hyphenate_to(word, size) point = hyphenate(word).delete_if { |e| e >= size }.max if point.nil? [nil, word] else [word[0 ... point] + "-", word[point .. -1]] end end |
#stats ⇒ Object
Returns a string describing the structure of the patterns for the language of this hyphenation object.
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
# File 'lib/text/hyphen.rb', line 184 def stats _b = @language.both.size _s = @language.start.size _e = @language.stop.size _h = @language.hyphen.size _x = @language.exceptions.size _T = _b + _s + _e + _h + _x s = <<-EOS The language '%s' contains %d total hyphenation patterns. % 6d patterns are word start patterns. % 6d patterns are word stop patterns. % 6d patterns are word start/stop patterns. % 6d patterns are normal patterns. % 6d patterns are exceptions. EOS s % [ @iso_language, _T, _s, _e, _b, _h, _x ] end |
#visualise(word) ⇒ Object Also known as: visualize
Returns a visualization of the hyphenation points.
hyp.visualize('representation') #=> rep-re-sen-ta-tion
Because hyphenation can be expensive, if the word has been visualised previously, it will be returned from a per-instance cache.
155 156 157 158 159 160 161 162 |
# File 'lib/text/hyphen.rb', line 155 def visualise(word) return @vcache[word] if @vcache.has_key?(word) w = word.dup hyphenate(w).each_with_index do |pos, n| w[pos.to_i + n, 0] = '-' if pos != 0 end @vcache[word] = w end |