Pascoale

Minor utilities for text processing in Brazilian Portuguese.

I'm going to add new functions as I need them.

Currently it has:

  • Pluralization and singularization (>= v0.3.0);
  • Proparoxytone, paroxytone and oxytone detection (>= v0.3.0);
  • Simple formatting considering accents in portuguese (upcase, downcase, capitalize);
  • Title formatting, considering prepositions;
  • Variations of a word at one and two edit distances (Reference: http://norvig.com/spell-correct.html);
  • Heuristic syllabic separation. My tests against a corpus of ~170K words shows 99.36% correctness (improved to ~99.56% on v0.3.0).

The name of the gem is a homage to "Prof. Pasquale Cipro Neto" (http://pt.wikipedia.org/wiki/Pasquale_Cipro_Neto), a great teacher! And yes, the name of the gem is wrong spelled as a joke ^_^

Installation

Add this line to your application's Gemfile:

gem 'pascoale'

And then execute:

$ bundle

Or install it yourself as:

$ gem install pascoale

Usage

Pluralization and singularization

*(Lowercase only)

require 'pascoale'

capt = Pascoale::Inflector.new('capitão')
puts capt.pluralize # => capitães

capts = Pascoale::Inflector.new('capitães')
puts capts.singularize # => capitão

captn = Pascoale::Inflector.new('capitãozinho')
puts captn.singularize # => capitãezinhos

qq = Pascoale::Inflector.new('qualquer')
puts qq.singularize # => quaisquer

Proparoxytones, Paroxytones and Oxytones

*(Lowercase only)

require 'pascoale'

diox = Pascoale::Reflector.new('dióxido')
puts diox.proparoxytone? # => true
puts diox.paroxytone?    # => false
puts diox.oxytone?       # => false

ideia = Pascoale::Reflector.new('ideia')
puts ideia.proparoxytone? # => false
puts ideia.paroxytone?    # => true
puts ideia.oxytone?       # => false

parati = Pascoale::Reflector.new('parati')
puts parati.proparoxytone? # => false
puts parati.paroxytone?    # => false
puts parati.oxytone?       # => true

Text formatter

require 'pascoale'

text = Pascoale::Formatter.new('Isso é um teste de formatação')

# Basic formatting
puts text.upcase # => ISSO É UM TESTE DE FORMATAÇÃO
puts text.downcase # => isso é um teste de formatação
puts text.capitalize # => Isso é um teste de formatação

# Fancy formatting (good for titles)
puts text.as_title # => Isso É um Teste de Formatação

# Predicates
puts text.upcase.upcase? # => true
puts text.upcase.downcase? # => false
puts text.capitalize? # => true

Variations of a word (typos and misspelling)

require 'pascoale'

edits = Pascoale::Edits.new('você')

# 1 edit distance
puts edits.editions.inspect

# 2 edits distance
puts edits.editions2.inspect # LOTS of output, beware.

Syllabic separation *(Lowercase only)

require 'pascoale'

separator = Pascoale::SyllableSeparator.new('exceção')
puts separator.separated.inspect # => ["ex", "ce", "ção"]

separator = Pascoale::SyllableSeparator.new('aéreo')
puts separator.separated.inspect # => ["a", "é", "re", "o"]

separator = Pascoale::SyllableSeparator.new('apneia')
puts separator.separated.inspect # => ["ap", "nei", "a"]

separator = Pascoale::SyllableSeparator.new('construir')
puts separator.separated.inspect # => ["cons", "tru", "ir"]

Contributing

  1. Fork it ( http://github.com//pascoale/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request