Class: Pascoale::SyllableSeparator

Inherits:
Object
  • Object
show all
Includes:
Constants
Defined in:
lib/pascoale/syllable_separator.rb

Constant Summary collapse

ONSET =
"(?:ch|lh|nh|gu|qu|[pbtdcgfv][lr]|[#{CONSONANTS}])"
CODA =
'[bcdfghjklmnpqrstvwxz]'
NUCLEUS_RULES =

Biggest problem are “sinéreses” and “diéreses”. It seems some consonants like “n” and “m” in the next syllable can cause it.

['ãe',
'ão',
'õe',
'au',
'ou',
'iu(?!m$)',
'[áâàãéêíóôú][iu]',
'[aieou][iu](?=[aeo])',
"ai(?!m$|ns$|r$|ç[ãõ]|[nm]#{ONSET}|nh)",
"eu(?![nm]#{ONSET})",
"ei(?![nm]#{ONSET})",
"ui(?!m$|ns$|ç[ãõ]|r$|dade$|z|[nm]#{ONSET}|nar$|d[ao]$|dora?$)",
"oi(?!m$|ns$|ç[ãõ]|r$|dade$|z|[nm]#{ONSET}|nar$|dora?$)",
'[aáâàãeéêiíoóôuúy]']
NUCLEUS =
"(?:#{NUCLEUS_RULES.join('|')})"
KERNEL =

The concept of “rhyme” does not help in this algorithm. It seems the concept makes no sense for syllable separation in portuguese (by an algorithm, at least)

"#{ONSET}?#{NUCLEUS}"

Constants included from Constants

Constants::ACCENTED, Constants::AS, Constants::CONSONANTS, Constants::ES, Constants::IS, Constants::LETTERS, Constants::NOT_ACCENTED, Constants::OS, Constants::SEMIVOWELS, Constants::US, Constants::VOWELS, Constants::YS

Instance Method Summary collapse

Constructor Details

#initialize(word) ⇒ SyllableSeparator

Returns a new instance of SyllableSeparator.



32
33
34
# File 'lib/pascoale/syllable_separator.rb', line 32

def initialize(word)
  @word = word
end

Instance Method Details

#separateObject Also known as: separated



36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# File 'lib/pascoale/syllable_separator.rb', line 36

def separate
  rest = @word
  result = []
  while rest && rest.size > 0
    if rest =~ /^(#{KERNEL})(?:(#{KERNEL})|(#{CODA})(#{KERNEL})|(#{CODA}#{CODA})(#{KERNEL})|(#{CODA}#{CODA})|(#{CODA}))?(.*)$/
      result << $1 + $3.to_s + $5.to_s + $7.to_s + $8.to_s
      rest = $2.to_s + $4.to_s + $6.to_s + $9.to_s
      # Special case! Hate them :(
      # Pneu, Gnomo, Mnemônica, Pseudônimo
    elsif result.size == 0
      if rest =~ /^([#{CONSONANTS}]#{KERNEL})(?:(#{KERNEL})|(#{CODA})(#{KERNEL})|(#{CODA}#{CODA})(#{KERNEL})|(#{CODA}#{CODA})|(#{CODA}))?(.*)$/
        result << $1 + $3.to_s + $5.to_s + $7.to_s + $8.to_s
        rest = $2.to_s + $4.to_s + $6.to_s + $9.to_s
      else
        raise %(Cannot separate "#{@word}". No rule match next syllable at "#{result.join('')}|>#{rest}")
      end
    else
      raise %(Cannot separate "#{@word}". No rule match next syllable at "#{result.join('')}|>#{rest}")
    end
  end
  result
end