Ferret

Ferret is a Ruby port of the Java Lucene search engine. (jakarta.apache.org/lucene/) In the same way as Lucene, it is not a standalone application, but a library you can use to index documents and search for things in them later.

Requirements

  • Ruby 1.8

  • (C compiler to build the extension but not required to use Ferret)

Installation

If you have gems installed you can simple do;

gem install ferret

Otherwise, de-compress the archive and enter its top directory.

tar zxpvf ferret-0.1.tar.gz
cd ferret-0.1

Run the setup config;

$ ruby setup.rb config
$ ruby setup.rb setup
# ruby setup.rb install

These simple steps install ferret in the default location of Ruby libraries. You can also install files into your favorite directory by supplying setup.rb some options. Try;

$ ruby setup.rb --help

Usage

You can read the TUTORIAL which you’ll find in the same directory as this README. You can also check the following modules for more specific documentation.

  • Ferret::Analysis: for more information on how the data is processed when it is tokenized. There are a number of things you can do with your data such as adding stop lists or perhaps a porter stemmer. There are also a number of analyzers already available and it is almost trivial to create a new one with a simple regular expression.

  • Ferret::Search: for more information on querying the index. There are a number of already available queries and it’s unlikely you’ll need to create your own. You may however want to take advantage of the sorting or filtering abilities of Ferret to present your data the best way you see fit.

  • Ferret::Document: to find out how to create documents. This part of Ferret is relatively straightforward. The main thing that we haven’t gone into here is the use of term vectors. These allow you to store and retrieve the positions and offsets of the data which can be very useful in document comparison amoung other things. == More information

  • Ferret::QueryParser: if you want to find out more about what you can do with Ferret’s Query Parser, this is the place to look. The query parser is one area that could use a bit of work so please send your suggestions.

  • Ferret::Index: for more advanced access to the index you’ll probably want to use the Ferret::Index::IndexWriter and Ferret::Index::IndexReader. This is the place to look for more information on them.

  • Ferret::Store: This is the module used to access the actual index storage and won’t be of much interest to most people.

Performance

Currently Ferret is an order of magnitude slower than Java Lucene which can be quite a pain at times. I have written some basic C extensions which may or may not have installed when you installed Ferret. These double the speed but still leave it a lot slower than the Java version. I have, however, ported the indexing part of Java Lucene to C and it is an order of magnitude faster then the Java version. Once I’m pretty certain that the API of Ferret has settled and won’t be changing much, I’ll intergrate my C version. So expect to see Ferret running faster than Java Lucene some time in the future. If you’d like to try cferret and test my claims, let me know (if you haven’t already found it in my subversion repository). It’s not currently portable and will probably only run on linux.

Contact

For bug reports and patches I have set up Trac here;

http://ferret.davebalmain.com/trac

Queries, discussion etc should be addressed to the forum or mailing lists hosted at;

http://rubyforge.org/projects/ferret/

Alternatively you could create a new page for discussion on the wiki at my Trac page above. Or, if you’re shy, please feel free to email me directly at [email protected]

Of course, since Ferret is almost a straight port of Java Lucene, everything said about Lucene at jakarta.apache.org/lucene/ should be true about Ferret. Apart from the bits about it being in Java.

Authors

David Balmain

Port to Ruby

The Apache Software Foundation (Doug Cutting and friends)

Original Apache Lucene

License

Ferret is available under an MIT-style license.

:include: MIT-LICENSE