Class: Pascoale::SyllableSeparator
- Inherits:
-
Object
- Object
- Pascoale::SyllableSeparator
- Includes:
- Constants
- Defined in:
- lib/pascoale/syllable_separator.rb
Constant Summary collapse
- ONSET =
"(?:ch|lh|nh|gu|qu|[pbtdcgfv][lr]|[#{CONSONANTS}])"
- CODA =
'[bcdfghjklmnpqrstvwxz]'
- NUCLEUS_RULES =
Biggest problem are “sinéreses” and “diéreses”. It seems some consonants like “n” and “m” in the next syllable can cause it.
['ãe', 'ão', 'õe', 'au', 'ou', 'iu(?!m$)', '[áâàãéêíóôú][iu]', '[aieou][iu](?=[aeo])', "ai(?!m$|ns$|r$|ç[ãõ]|[nm]#{ONSET}|nh)", "eu(?![nm]#{ONSET})", "ei(?![nm]#{ONSET})", "ui(?!m$|ns$|ç[ãõ]|r$|dade$|z|[nm]#{ONSET}|nar$|d[ao]$|dora?$)", "oi(?!m$|ns$|ç[ãõ]|r$|dade$|z|[nm]#{ONSET}|nar$|dora?$)", '[aáâàãeéêiíoóôuúy]']
- NUCLEUS =
"(?:#{NUCLEUS_RULES.join('|')})"
- KERNEL =
The concept of “rhyme” does not help in this algorithm. It seems the concept makes no sense for syllable separation in portuguese (by an algorithm, at least)
"#{ONSET}?#{NUCLEUS}"
Constants included from Constants
Constants::ACCENTED, Constants::AS, Constants::CONSONANTS, Constants::ES, Constants::IS, Constants::LETTERS, Constants::NOT_ACCENTED, Constants::OS, Constants::SEMIVOWELS, Constants::US, Constants::VOWELS, Constants::YS
Instance Method Summary collapse
-
#initialize(word) ⇒ SyllableSeparator
constructor
A new instance of SyllableSeparator.
- #separate ⇒ Object (also: #separated)
Constructor Details
#initialize(word) ⇒ SyllableSeparator
Returns a new instance of SyllableSeparator.
32 33 34 |
# File 'lib/pascoale/syllable_separator.rb', line 32 def initialize(word) @word = word end |
Instance Method Details
#separate ⇒ Object Also known as: separated
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
# File 'lib/pascoale/syllable_separator.rb', line 36 def separate rest = @word result = [] while rest && rest.size > 0 if rest =~ /^(#{KERNEL})(?:(#{KERNEL})|(#{CODA})(#{KERNEL})|(#{CODA}#{CODA})(#{KERNEL})|(#{CODA}#{CODA})|(#{CODA}))?(.*)$/ result << $1 + $3.to_s + $5.to_s + $7.to_s + $8.to_s rest = $2.to_s + $4.to_s + $6.to_s + $9.to_s # Special case! Hate them :( # Pneu, Gnomo, Mnemônica, Pseudônimo elsif result.size == 0 if rest =~ /^([#{CONSONANTS}]#{KERNEL})(?:(#{KERNEL})|(#{CODA})(#{KERNEL})|(#{CODA}#{CODA})(#{KERNEL})|(#{CODA}#{CODA})|(#{CODA}))?(.*)$/ result << $1 + $3.to_s + $5.to_s + $7.to_s + $8.to_s rest = $2.to_s + $4.to_s + $6.to_s + $9.to_s else raise %(Cannot separate "#{@word}". No rule match next syllable at "#{result.join('')}|>#{rest}") end else raise %(Cannot separate "#{@word}". No rule match next syllable at "#{result.join('')}|>#{rest}") end end result end |