HarmonizerRedis

HarmonizerRedis is a Ruby gem that aids the process of relabeling/grouping free text phrases to resolve the many ways people spell or describe something. It uses fuzzy string matching along with inverse term frequencies to score and rank similarities between phrases. The gem uses Redis for performance.

Usage

Configuration

The Redis must be configured first. Refer to the Redis for more information. Redis.current should be set to the Redis connection.

Redis.current = Redis.new

Adding an entry

HarmonizerRedis::Linkage represents the connection between your data structures and the gem. Linkages contain string content, an id (which will be a uniquely generated uuid), and a category_id which identifies the collection this entry belongs to.

my_category_id = 100
linkage = HarmonizerRedis::Linkage.new(content: 'harmonizer redis',
                                       category_id: my_category_id)
linkage.save
my_linkage_id = linkage.id # "520c488b-e9f8-4a6f-aaea-0d5e37b97644"

Retrieving an entry

my_linkage = HarmonizerRedis::Linkage.find(my_linkage_id)

Calculating and Retrieving Similarities

Calculate similarities for all the linkages in a category in a batch. New calculations will need to be performed if new linkages are added.

HarmonizerRedis.calculate_similarities(my_category_id)

To get an Array of similar phrases. The default is to return the top 20 phrases. If new linkages have been added or if the similarities have not yet been computed for this linkage, it will be computed automatically with this call.

my_linkage.get_similarities

Each entry in this array is an array in the following format [text_label, group_label, similarity_score, phrase_id]

After deciding which phrase the linkage should be combined with - use the accompanying phrase_id data to merge the phrases into a group

my_linkage.merge_with_phrase(phrase_id)

To label everything in the same group:

my_linkage.set_corrected_label('HarmonizerRedis')

To suggest labels for this group (this works better the more HarmonizerRedis is used)

my_linkage.recommend_labels

Lastly to get the final corrected label of a linkage:

my_linkage.corrected

Contributing

Feel free to fork this repo and change it as you wish. We prefer pull requests on github, but you can send us emails. All attributions need to be tested as well.

License

The gem is available as open source under the terms of the MIT License.