MarkovWords

At EXPLO, we often have a need for specific vocabulary-generators. For example, we might want to make a password generator, or a Harry Potter house-generator, or some such thing.

As it turns out, Markov Chains are an excellent way to create specific vocabularies by "training" a model against a set of words to determine common combinations.

While there are quite a few wonderful Ruby libraries that do this, they all focus either on actual English words, or on creating random sentences but not words. We created this library to do the same thing, but with words, hence the name MarkovWords.

Installation

Add this line to your application's Gemfile:

gem 'markov_words'

And then execute:

$ bundle

Or install it yourself as:

$ gem install markov_words

Usage

Basic usage is as follows:

require 'markov_words'

generator = MarkovWords::Generator.new
# returns a random word
puts generator.word

You might prefer using a number of n-grams (letter combinations being tracked) higher than the default number (which is 2). We've found that the higher you go, the more accurate words tend to sound, as the likelihood that you've started with a partial word the entire length of a word from your dictionary goes up. The increased "real-sounding-ness" comes at the expense of having to generate a much larger database of n-gram => letter correspondences, and accordingly slower access times.

To set gram_size:

generator = MarkovWords::Generator.new(gram_size: 7)
# Will take a while the first time, while the database is created.
puts generator.word

Dictionary

By default, MarkovWords will use the system dictionary located (on Macs) in /usr/share/dict/words. You can change this setting:

# eg to generate random proper names instead of English-sounding words.
generator = MarkovWords::Generator.new(corpus_file: '/usr/share/dict/propernames')

This is pretty great, because it means that if you have a dictionary to emulate, you can make words that sound like anything!

Data Storage

MarkovWords stores its database of n-gram concurrences on disk and loads it into memory when necessary. You can control the location of the data file with:

# eg to store the data in /tmp/markov.data
generator = MarkovWords::Generator.new(data_file: /tmp/markov.data)

You can also clear out the contents of the data file (because MarkovWords will re-use it by default), by passing flush_data: true:

# eg to store the data in /tmp/markov.data
generator = MarkovWords::Generator.new(data_file: /tmp/markov.data, flush_data: true)

Caching

Because calculation can get slow, especially at high n-gram sizes, MarkovWords will cache 100 words by default . If you want to control caching, you can adjust caching parameters eg:

# For no caching whatsoever
generator = MarkovWords::Generator.new(perform_caching: false)

# To change the number of pre-computed/stored words to 1000:
generator = MarkovWords::Generator.new(cache_size: 1000)

You can "top off" the cache to make sure it's full with:

generator = MarkovWords::Generator.new
generator.refresh_cache

Change Log

1.0.0 introduced a couple of breaking changes:
- Words class renamed to Generator.
- Generator:
  - cache: [boolean] parameter was re-named to perform_caching: [boolean].
  - Removed a lot of attr_accessor variables such as data_store, min_length, max_length etc., in favor of a leaner + cleaner API.
  - The cache file is no longer persisted to disk separately (because FileStore is using SQLite instead of direct-disk storage).
0.2.x was all about Rubocop compliance, so it was a few method refactors but nothing major.
0.1.0 initial commit

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/exploration/markov_words. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the MarkovWords project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.