GIGO (Garbage In, Garbage Out)
Or better yet, Garbage In, Gold Out! - The GIGO gem aims to fix ruby string encodings at all costs!
The GIGO gem is not likely the proper solutions. If you have bad encodings in your database, you should fix them and write consistent encodings. That said, if you have no other choice, GIGO can help.
This gem depends on a series of transcoders including ActiveSupport::Multibyte#tidy_bytes
along with one of the many public forks of CharDet
for ruby. Since CharDet
is not a public gem and following proper semantic versioning, we have decided to vendor the kirillrdy/rchardet repo. We have even made sure that our vendored version stays in our namesacpe by using GIGO::CharDet
. So if you have another version bundled, feel confident that the two will not conflict.
Usage
Simple, just pass a string to GIGO.load
. Nil values or properly encoded strings are returned. Else, GIGO
will do its best to convert and force your default internal (or UTF-8) encoding.
GIGO.load "€20 – “Woohoo”"
Lets say you have a comments
column on an ActiveRecord model which is not guaranteed to come back per your default external encoding.
def comments
GIGO.load read_attribute(:comments)
end
GIGO's encoding can be configured using the GIGO.encoding
accessor. By default this is either Encoding.default_internal
with a fallback to Encoding::UTF_8
.
Transcoders
GIGO transcoders can be any module or class that implements the transcode
method. This method takes one argument, the string to transcode and can hook into the GIGO.encoding
if needed. The default list of transcoders is.
- GIGO::Transcoders::ActiveSupport
- GIGO::Transcoders::CharDet
- GIGO::Transcoders::Blind
GIGO attempts to use each in that order. Upon successful transcoding, we use the EnsureValidEncoding gem to force an encoding to match the GIGO.encoding
while removing any non-convertable characters.
Toe Dough List
Remvoe CharDet and look at something like CharlockHolmes. I had install problems with this and it also failed a few initial tire kicks. See my notes here on the topic.
Contributing
GIGO is fully tested with ActiveSupport 3.0 to 4 and upward. If you detect a problem, open up a github issue or fork the repo and help out. After you fork or clone the repository, the following commands will get you up and running on the test suite.
$ bundle install
$ bundle exec rake appraisal:setup
$ bundle exec rake appraisal test
We use the appraisal gem from Thoughtbot to help us generate the individual gemfiles for each ActiveSupport version and to run the tests locally against each generated Gemfile. The rake appraisal test
command actually runs our test suite against all Rails versions in our Appraisal
file. If you want to run the tests for a specific Rails version, use rake -T
for a list. For example, the following command will run the tests for Rails 3.2 only.
$ bundle exec rake appraisal:activesupport32 test