GIGO (Garbage In, Garbage Out)

Or better yet, Garbage In, Gold Out! - The GIGO gem aims to fix ruby string encodings at all costs!

The GIGO gem is not likely the proper solutions. If you have bad encodings in your database, you should fix them and write consistent encodings. That said, if you have no other choice, GIGO can help.

This gem depends on a series of transcoders including ActiveSupport::Multibyte#tidy_bytes along with one of the many public forks of CharDet for ruby. Since CharDet is not a public gem and following proper semantic versioning, we have decided to vendor the kirillrdy/rchardet repo. We have even made sure that our vendored version stays in our namesacpe by using GIGO::CharDet. So if you have another version bundled, feel confident that the two will not conflict.

Usage

Simple, just pass a string to GIGO.load. Nil values or properly encoded strings are returned. Else, GIGO will do its best to convert and force your default internal (or UTF-8) encoding.

  GIGO.load "€20 – “Woohoo”"

Lets say you have a comments column on an ActiveRecord model which is not guaranteed to come back per your default external encoding.

def comments
  GIGO.load read_attribute(:comments)
end

GIGO's encoding can be configured using the GIGO.encoding accessor. By default this is either Encoding.default_internal with a fallback to Encoding::UTF_8.

Transcoders

GIGO transcoders can be any module or class that implements the transcode method. This method takes one argument, the string to transcode and can hook into the GIGO.encoding if needed. The default list of transcoders is.

  • GIGO::Transcoders::ActiveSupport
  • GIGO::Transcoders::CharDet
  • GIGO::Transcoders::Blind

GIGO attempts to use each in that order. Upon successful transcoding, we use the EnsureValidEncoding gem to force an encoding to match the GIGO.encoding while removing any non-convertable characters.

Toe Dough List

Remvoe CharDet and look at something like CharlockHolmes. I had install problems with this and it also failed a few initial tire kicks. See my notes here on the topic.

Contributing

GIGO is fully tested with ActiveSupport 3.0 to 4 and upward. If you detect a problem, open up a github issue or fork the repo and help out. After you fork or clone the repository, the following commands will get you up and running on the test suite.

$ bundle install
$ bundle exec rake appraisal:setup
$ bundle exec rake appraisal test

We use the appraisal gem from Thoughtbot to help us generate the individual gemfiles for each ActiveSupport version and to run the tests locally against each generated Gemfile. The rake appraisal test command actually runs our test suite against all Rails versions in our Appraisal file. If you want to run the tests for a specific Rails version, use rake -T for a list. For example, the following command will run the tests for Rails 3.2 only.

$ bundle exec rake appraisal:activesupport32 test