Rambling Trie

Gem Version Dependency Status Build Status Code Climate Coverage Status Documentation Status

The Rambling Trie is a Ruby implementation of the trie data structure, which includes compression abilities and is designed to be very fast to traverse.

Installing the Rambling Trie

Requirements

You will need:

  • Ruby 2.1.0 or up
  • RubyGems

See RVM, rbenv or chruby for more information on how to manage Ruby versions.

Installation

You can either install it manually with:

gem install rambling-trie

Or, include it in your Gemfile and bundle it:

gem 'rambling-trie'

Using the Rambling Trie

Creation

To create a new trie, initialize it like this:

trie = Rambling::Trie.create

You can also provide a block and the created trie instance will be yielded for you to perform any operation on it:

Rambling::Trie.create do |trie|
  trie << 'word'
end

Additionally, you can provide the path to a file that contains all the words to be added to the trie, and it will read the file and create the complete structure for you, like this:

trie = Rambling::Trie.create '/path/to/file'

By default, a plain text file with the following format will be expected:

some
words
to
populate
the
trie

If you want to use a custom file format, you will need to provide a custom file reader that defines an #each_word method that yields each word contained in the file. Look at the PlainText reader class for an example, and at the Configuration section to see how to add your own custom file readers.

Operations

To add new words to the trie, use #add or its alias #<<:

trie.add 'word'
trie << 'word'

And to find out if a word already exists in the trie, use #word? or its alias #include?:

trie.word? 'word'
trie.include? 'word'

If you wish to find if part of a word exists in the trie instance, you should call #partial_word? or its alias #match?:

trie.partial_word? 'partial_word'
trie.match? 'partial_word'

To get all the words that start with a particular string, you can use #scan or its alias #words:

trie.scan 'hi' # => ['hi', 'high', 'highlight', ...]
trie.words 'hi' # => ['hi', 'high', 'highlight', ...]

To get all the words within a given string, you can use #words_within:

trie.words_within 'ifdxawesome45someword3' # => ['if', 'aw', 'awe', ...]
trie.words_within 'tktktktk' # => []

Or, if you're just interested in knowing whether a given string contains any valid words or not, you can use #words_within?:

trie.words_within? 'ifdxawesome45someword3' # => true
trie.words_within? 'tktktktk' # => false

Compression

By default, the Rambling Trie works as a standard trie. Starting from version 0.1.0, you can obtain a compressed trie from the standard one, by using the compression feature. Just call the #compress! method on the trie instance:

trie.compress!

This will reduce the size of the trie by using redundant node elimination (redundant nodes are the only-child non-terminal nodes).

Note: The #compress! method acts over the trie instance it belongs to and is destructive. Also, adding words after compression (with #add or #<<) is not supported.

You can find out if a trie instance is compressed by calling the #compressed? method:

trie.compressed?

Enumeration

Starting from version 0.4.2, you can use any Enumerable method over a trie instance, and it will iterate over each word contained in the trie. You can now do things like:

trie.each { |word| puts word }
trie.any? { |word| word.include? 'x' }
trie.all? { |word| word.include? 'x' }
# etc.

Serialization

Starting from version 1.0.0, you can store a full trie instance on disk and retrieve/use it later on. Loading a trie from disk takes less time, less cpu and less memory than loading every word into the trie every time. This is particularly useful for production applications, when you have word lists that you know are going to be static, or that change with little frequency.

To store a trie on disk, you can use .dump like this:

Rambling::Trie.dump trie, '/path/to/file'

Then, when you need to use a trie next time, you don't have to create a new one with all the necessary words. Rather, you can retrieve a previously stored one with .load like this:

trie = Rambling::Trie.load trie, '/path/to/file'

Supported formats

Currently, these formats are supported to store tries on disk:

When dumping into or loading from disk, the format is determined automatically based on the file extension, so .yml or .yaml files will be handled through YAML and .marshal files through Marshal.

Optionally, you can use a .zip version of the supported formats. In order to do so, you'll have to install the rubyzip gem:

gem install rubyzip

Or, include it in your Gemfile and bundle it:

gem 'rubyzip'

Then, you can load contents form a .zip file like this:

require 'zip'
trie = Rambling::Trie.load trie, '/path/to/file.zip'

For .zip files, the format is also determined automatically based on the file extension, so .yml.zip or .yaml.zip files will be handled through YAML after decompression and .marshal.zip files through Marshal.

Configuration

Starting from version 1.0.0, you can change the configuration values used by Rambling Trie. You can now supply:

  • A Compressor object
  • A root Node builder
  • More Readers (implement #each_word)
  • Change the default reader
  • More Serializers (implement #dump and #load)
  • Change the default serializer

You can configure those values by using .config like this:

require 'rambling-trie'

Rambling::Trie.config do |config|
  config.compressor = MyCompressor.new
  config.root_builder = lambda { MyNode.new }

  config.readers.add :html, MyHtmlReader.new
  config.readers.default = config.readers[:html]

  config.serializers.add :json, MyJsonSerializer.new
  config.serializers.default = config.serializers[:yml]
end

# Create a trie or load one from disk and do things with it...

Further Documentation

You can find further API documentation on the autogenerated rambling-trie gem RubyDoc.info page or if you want edge documentation, you can go the GitHub project RubyDoc.info page.

Compatible Ruby and Rails versions

The Rambling Trie has been tested with the following Ruby versions:

  • 2.4.x
  • 2.3.x
  • 2.2.x
  • 2.1.x

No longer supported:

  • 2.0.x (might still work, but is not officially supported)
  • 1.9.x
  • 1.8.x

Contributing to Rambling Trie

Take a look at the contributing guide to get started, or fire a question to @gonzedge.

License and copyright

Copyright (c) 2012-2017 Edgar Gonzalez

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.