README for bloomfilter

This gem contains two Ruby Bloom filter implementations. A discussion of Bloom filters in general is beyond the scope of this document, but from a high level, the point of a Bloom filter is to compactly represent a set of items so that you can avoid doing a costly full-set search if you're confident that the item isn't in the set.

This library includes two implementations: BloomFilter, which is a fast, in-memory implementation; and ExternalBloomFilter, which is a file-based implementation. When you will be doing many searches of the filter, you should use the BloomFilter. If you are memory constrained or will be using the filter sparingly, use the ExternalBloomFilter.

To create a new filter, simply use

filter =

and then you can add to it by

filter.add "item"

and query it by

filter.include? "item" #=> true

If it is non-trivial to build your filter in the first place, you'll likely want to persist it. Use the method to write the bit field out to a file. To load a saved filter, use BloomFilter.load(path). This file format is compatible with ExternalBloomFilter as well.

ExternalBloomFilter works by keeping the bit array on disk and only reading 5 bytes per lookup and reading and writing 5 bytes each per addition. This makes it very memory efficient (read: nearly zero).

To use ExternalBloomFilter, the only difference is that instead of specifying a size, you specify a path. Here's how you create a new filter:

filter = ExternalBloomFilter.create(path, size_of_bit_array)

Here's how you load an existing filter:

filter =

After you have a valid filter, you add to it and query it the same as the BloomFilter class. Note that any changes you make to the filter by adding to it will persist immediately to disk. There is no undo with the External implementation.