Class: Text::Hyphen

Inherits:
Object
  • Object
show all
Defined in:
lib/text/hyphen.rb

Overview

An object that knows how to perform hyphenation based on the TeX hyphenation algorithm with pattern files. Each object is constructed with a specific language's hyphenation patterns.

Defined Under Namespace

Classes: Language

Constant Summary

DEBUG =
false
VERSION =
'1.4.1'
DEFAULT_MIN_LEFT =
2
DEFAULT_MIN_RIGHT =
2

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options = {}) {|_self| ... } ⇒ Hyphen

Creates a hyphenation object with the options requested. The options available are:

language

The language to perform hyphenation with. See #language and #iso_language.

left

The minimum number of characters to the left of a hyphenation point. See #left.

right

The minimum number of characters to the right of a hyphenation point. See #right.

The options can be provided either as hashed parameters or set as methods in an initialization block. The following initializations are all equivalent:

hyp = Text::Hyphenate.new(:language => 'en_us')
hyp = Text::Hyphenate.new(language: 'en_us') # under Ruby 1.9
hyp = Text::Hyphenate.new { |h| h.language = 'en_us' }

Yields:

  • (_self)

Yield Parameters:

  • _self (Text::Hyphen)

    the object that the method was called on



76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
# File 'lib/text/hyphen.rb', line 76

def initialize(options = {}) # :yields self:
  @iso_language = options[:language]
  @left         = options[:left]
  @right        = options[:right]
  @language     = nil

  @cache        = {}
  @vcache       = {}

  @hyphen       = {}
  @begin_hyphen = {}
  @end_hyphen   = {}
  @both_hyphen  = {}
  @exception    = {}

  @first_load = true
  yield self if block_given?
  @first_load = false

  load_language

  @left  ||= DEFAULT_MIN_LEFT
  @right ||= DEFAULT_MIN_RIGHT
end

Instance Attribute Details

#iso_languageObject (readonly)

Returns the language's ISO 639 ID, e.g., “en_us” or “pt”.



57
58
59
# File 'lib/text/hyphen.rb', line 57

def iso_language
  @iso_language
end

#languageObject

The name of the language to be used in hyphenating words. This will be a two or three character ISO 639 code, with the two character form being the canonical resource name. This will load the language hyphenation definitions from text/hyphen/language/<code> as a Ruby class. The resource 'text/hyphen/language/en_us' defines the language class Text::Hyphen::Language::EN_US. It also defines the secondary forms Text::Hyphen::Language::EN and Text::Hyphen::Language::ENG_US.

Minimal transformations will be performed on the language code provided, such that any dashes are converted to underscores (e.g., 'en-us' becomes 'en_us') and all characters are regularised. Resource names will be downcased and class names will be converted to uppercase (e.g., 'Pt' for the Portuguese language becomes 'pt' and 'PT', respectively).

The language may also be specified as an instance of Text::Hyphen::Language.



41
42
43
# File 'lib/text/hyphen.rb', line 41

def language
  @language
end

#leftObject

No fewer than this number of letters will show up to the left of the hyphen. The initial value for this will be specified by the language; setting this value will override the language's defaults.



19
20
21
# File 'lib/text/hyphen.rb', line 19

def left
  @left
end

#rightObject

No fewer than this number of letters will show up to the right of the hyphen. This overrides the default specified in the language.



23
24
25
# File 'lib/text/hyphen.rb', line 23

def right
  @right
end

Class Method Details

.require_real_hyphenation_file(loader) ⇒ Object

Resolves a file for cleaner loading from a hyphenation loader file.



255
256
257
258
259
260
261
262
263
264
# File 'lib/text/hyphen.rb', line 255

def self.require_real_hyphenation_file(loader) # :nodoc:
  p = File.dirname(loader)
  f = File.basename(loader)
  v = if RUBY_VERSION < "1.9.1"
        "1.8"
      else
        "1.9"
      end
  require File.join(p, v, f)
end

Instance Method Details

#clear_cache!Object

Clears the per-instance hyphenation and visualization caches.



174
175
176
177
# File 'lib/text/hyphen.rb', line 174

def clear_cache!
  @cache.clear
  @vcache.clear
end

#hyphenate(word) ⇒ Object

Returns an array of character positions where a word can be hyphenated.

hyp.hyphenate('representation') #=> [3, 5, 8 10]

Because hyphenation can be expensive, if the word has been hyphenated previously, it will be returned from a per-instance cache.



107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
# File 'lib/text/hyphen.rb', line 107

def hyphenate(word)
  word = word.downcase
  $stderr.puts "Hyphenating #{word}" if DEBUG
  return @cache[word] if @cache.has_key?(word)
  res = @language.exceptions[word]
  return @cache[word] = make_result_list(res) if res

  letters = word.scan(@language.scan_re)
  $stderr.puts letters.inspect if DEBUG
  word_size = letters.size

  result = [0] * (word_size + 1)
  right_stop = word_size - @right

  updater = Proc.new do |hash, str, pos|
    if hash.has_key?(str)
      $stderr.print "#{pos}: #{str}: #{hash[str]}" if DEBUG
      hash[str].scan(@language.scan_re).each_with_index do |cc, ii|
        cc = cc.to_i
        result[ii + pos] = cc if cc > result[ii + pos]
      end
      $stderr.print ": #{result.inspect}\n" if DEBUG
    end
  end

    # Walk the word
  (0..right_stop).each do |pos|
    rest_length = word_size - pos
    (1..rest_length).each do |length|
      substr = letters[pos, length].join('')
      updater[@language.hyphen, substr, pos]
      updater[@language.start, substr, pos] if pos.zero?
      updater[@language.stop, substr, pos] if (length == rest_length)
    end
  end

  updater[@language.both, word, 0] if @language.both[word]

  (0..@left).each { |i| result[i] = 0 }
  ((-1 - @right)..(-1)).each { |i| result[i] = 0 }
  @cache[word] = make_result_list(result)
end

#hyphenate_to(word, size, hyphen = '-') ⇒ Object

This function will hyphenate a word so that the first point is at most

NOTE: if hyphen is set to a string, it will still be counted as one character (since it represents a hyphen)

size characters.



185
186
187
188
189
190
191
192
# File 'lib/text/hyphen.rb', line 185

def hyphenate_to(word, size, hyphen = '-')
  point = hyphenate(word).delete_if { |e| e >= size }.max
  if point.nil?
    [nil, word]
  else
    [word[0 ... point] + hyphen, word[point .. -1]]
  end
end

#statsObject

Returns a string describing the structure of the patterns for the language of this hyphenation object.



196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
# File 'lib/text/hyphen.rb', line 196

def stats
  _b = @language.both.size
  _s = @language.start.size
  _e = @language.stop.size
  _h = @language.hyphen.size
  _x = @language.exceptions.size
  _T = _b + _s + _e + _h + _x

  s = "\nThe language '%s' contains %d total hyphenation patterns.\n% 6d patterns are word start patterns.\n% 6d patterns are word stop patterns.\n% 6d patterns are word start/stop patterns.\n% 6d patterns are normal patterns.\n% 6d patterns are exceptions.\n\n"
  s % [ @iso_language, _T, _s, _e, _b, _h, _x ]
end

#visualise(word, hyphen = '-') ⇒ Object Also known as: visualize

Returns a visualization of the hyphenation points.

hyp.visualize('representation') #=> rep-re-sen-ta-tion

Any string can be set instead of the default hyphen:

hyp.visualize('example', '&shy;') #=> exam&shy;ple

Because hyphenation can be expensive, if the word has been visualised previously, it will be returned from a per-instance cache.



160
161
162
163
164
165
166
167
168
169
170
# File 'lib/text/hyphen.rb', line 160

def visualise(word, hyphen = '-')
  return @vcache[word] if @vcache.has_key?(word)
  w = word.dup
  s = hyphen.size
  hyphenate(w).each_with_index do |pos, n|
    # Insert the hyphen string at the ported position plus the offset of
    # the last hyphen string inserted.
    w[pos.to_i + (n * s), 0] = hyphen unless pos.zero?
  end
  @vcache[word] = w
end