Pedantic cleans strings of text – stripping out unimportant words and URLs, fixing typos, replacing symbols (like emoticons) with real words, and running the results through a stemmer.
In short – it gives you reliable text to process (but not read).
And if the name didn’t give it away, yes this library is opinionated.
Grab the gem.
gem install pedantic
Pedantic.fix('my messy string ;)') #=> 'messi string joke'
Note that the stemmer generates imperfect words, but it is reasonably reliable and constant in the output, so you can work with those assumptions in the output.
Also – this library is a work in progress – currently I’ve aimed for a relatively useful but extremely basic implementation. If you look through the code, you’ll see there’s few typos and emoticons handled. It’s easy enough to extend, though – so please, fork, patch and send a pull request.
Fork and patch as you see fit – and please send me a pull request if you think it’s useful for others. Don’t forget to write specs first, and don’t mess with the version numbers please (or at least: only do so in a different branch).
Copyright © 2010 Pat Allan, but released under an open licence. Go for your life.