Class: Text::Hyphen

Inherits:
Object
  • Object
show all
Defined in:
lib/text/hyphen.rb

Overview

An object that knows how to perform hyphenation based on the TeX hyphenation algorithm with pattern files. Each object is constructed with a specific language’s hyphenation patterns.

Defined Under Namespace

Classes: Language

Constant Summary collapse

DEBUG =
false
VERSION =
'1.2'
DEFAULT_MIN_LEFT =
2
DEFAULT_MIN_RIGHT =
2

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options = {}) {|_self| ... } ⇒ Hyphen

Creates a hyphenation object with the options requested. The options available are:

language

The language to perform hyphenation with. See #language and #iso_language.

left

The minimum number of characters to the left of a hyphenation point. See #left.

right

The minimum number of characters to the right of a hyphenation point. See #right.

The options can be provided either as hashed parameters or set as methods in an initialization block. The following initializations are all equivalent:

hyp = Text::Hyphenate.new(:language => 'en_us')
hyp = Text::Hyphenate.new(language: 'en_us') # under Ruby 1.9
hyp = Text::Hyphenate.new { |h| h.language = 'en_us' }

Yields:

  • (_self)

Yield Parameters:

  • _self (Text::Hyphen)

    the object that the method was called on



75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# File 'lib/text/hyphen.rb', line 75

def initialize(options = {}) # :yields self:
  @iso_language = options[:language]
  @left         = options[:left]
  @right        = options[:right]
  @language     = nil

  @cache        = {}
  @vcache       = {}

  @hyphen       = {}
  @begin_hyphen = {}
  @end_hyphen   = {}
  @both_hyphen  = {}
  @exception    = {}

  @first_load = true
  yield self if block_given?
  @first_load = false

  load_language

  @left  ||= DEFAULT_MIN_LEFT
  @right ||= DEFAULT_MIN_RIGHT
end

Instance Attribute Details

#iso_languageObject (readonly)

Returns the language’s ISO 639 ID, e.g., “en_us” or “pt”.



56
57
58
# File 'lib/text/hyphen.rb', line 56

def iso_language
  @iso_language
end

#languageObject

The name of the language to be used in hyphenating words. This will be a two or three character ISO 639 code, with the two character form being the canonical resource name. This will load the language hyphenation definitions from text/hyphen/language/<code> as a Ruby class. The resource ‘text/hyphen/language/en_us’ defines the language class Text::Hyphen::Language::EN_US. It also defines the secondary forms Text::Hyphen::Language::EN and Text::Hyphen::Language::ENG_US.

Minimal transformations will be performed on the language code provided, such that any dashes are converted to underscores (e.g., ‘en-us’ becomes ‘en_us’) and all characters are regularised. Resource names will be downcased and class names will be converted to uppercase (e.g., ‘Pt’ for the Portuguese language becomes ‘pt’ and ‘PT’, respectively).

The language may also be specified as an instance of Text::Hyphen::Language.



40
41
42
# File 'lib/text/hyphen.rb', line 40

def language
  @language
end

#leftObject

No fewer than this number of letters will show up to the left of the hyphen. The initial value for this will be specified by the language; setting this value will override the language’s defaults.



18
19
20
# File 'lib/text/hyphen.rb', line 18

def left
  @left
end

#rightObject

No fewer than this number of letters will show up to the right of the hyphen. This overrides the default specified in the language.



22
23
24
# File 'lib/text/hyphen.rb', line 22

def right
  @right
end

Class Method Details

.require_real_hyphenation_file(loader) ⇒ Object

Resolves a file for cleaner loading from a hyphenation loader file.



243
244
245
246
247
248
249
250
251
252
# File 'lib/text/hyphen.rb', line 243

def self.require_real_hyphenation_file(loader) # :nodoc:
  p = File.dirname(loader)
  f = File.basename(loader)
  v = if RUBY_VERSION < "1.9.1"
        "1.8"
      else
        "1.9"
      end
  require File.join(p, v, f)
end

Instance Method Details

#clear_cache!Object

Clears the per-instance hyphenation and visualization caches.



166
167
168
169
# File 'lib/text/hyphen.rb', line 166

def clear_cache!
  @cache.clear
  @vcache.clear
end

#hyphenate(word) ⇒ Object

Returns an array of character positions where a word can be hyphenated.

hyp.hyphenate('representation') #=> [3, 5, 8 10]

Because hyphenation can be expensive, if the word has been hyphenated previously, it will be returned from a per-instance cache.



106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# File 'lib/text/hyphen.rb', line 106

def hyphenate(word)
  word = word.downcase
  $stderr.puts "Hyphenating #{word}" if DEBUG
  return @cache[word] if @cache.has_key?(word)
  res = @language.exceptions[word]
  return @cache[word] = make_result_list(res) if res

  letters = word.scan(@language.scan_re)
  $stderr.puts letters.inspect if DEBUG
  word_size = letters.size

  result = [0] * (word_size + 1)
  right_stop = word_size - @right

  updater = Proc.new do |hash, str, pos|
    if hash.has_key?(str)
      $stderr.print "#{pos}: #{str}: #{hash[str]}" if DEBUG
      hash[str].scan(@language.scan_re).each_with_index do |cc, ii|
        cc = cc.to_i
        result[ii + pos] = cc if cc > result[ii + pos]
      end
      $stderr.print ": #{result.inspect}\n" if DEBUG
    end
  end

    # Walk the word
  (0..right_stop).each do |pos|
    rest_length = word_size - pos
    (1..rest_length).each do |length|
      substr = letters[pos, length].join('')
      updater[@language.hyphen, substr, pos]
      updater[@language.start, substr, pos] if pos.zero?
      updater[@language.stop, substr, pos] if (length == rest_length)
    end
  end

  updater[@language.both, word, 0] if @language.both[word]

  (0..@left).each { |i| result[i] = 0 }
  ((-1 - @right)..(-1)).each { |i| result[i] = 0 }
  @cache[word] = make_result_list(result)
end

#hyphenate_to(word, size) ⇒ Object

This function will hyphenate a word so that the first point is at most size characters.



173
174
175
176
177
178
179
180
# File 'lib/text/hyphen.rb', line 173

def hyphenate_to(word, size)
  point = hyphenate(word).delete_if { |e| e >= size }.max
  if point.nil?
    [nil, word]
  else
    [word[0 ... point] + "-", word[point .. -1]]
  end
end

#statsObject

Returns a string describing the structure of the patterns for the language of this hyphenation object.



184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
# File 'lib/text/hyphen.rb', line 184

def stats
  _b = @language.both.size
  _s = @language.start.size
  _e = @language.stop.size
  _h = @language.hyphen.size
  _x = @language.exceptions.size
  _T = _b + _s + _e + _h + _x

  s = <<-EOS

The language '%s' contains %d total hyphenation patterns.
  % 6d patterns are word start patterns.
  % 6d patterns are word stop patterns.
  % 6d patterns are word start/stop patterns.
  % 6d patterns are normal patterns.
  % 6d patterns are exceptions.

EOS
  s % [ @iso_language, _T, _s, _e, _b, _h, _x ]
end

#visualise(word) ⇒ Object Also known as: visualize

Returns a visualization of the hyphenation points.

hyp.visualize('representation') #=> rep-re-sen-ta-tion

Because hyphenation can be expensive, if the word has been visualised previously, it will be returned from a per-instance cache.



155
156
157
158
159
160
161
162
# File 'lib/text/hyphen.rb', line 155

def visualise(word)
  return @vcache[word] if @vcache.has_key?(word)
  w = word.dup
  hyphenate(w).each_with_index do |pos, n|
    w[pos.to_i + n, 0] = '-' if pos != 0
  end
  @vcache[word] = w
end