Class: Ferret::Index::FieldInfo

Inherits:
Object
  • Object
show all
Defined in:
ext/r_index.c

Overview

Summary

The FieldInfo class is the field descriptor for the index. It specifies whether a field is compressed or not or whether it should be indexed and tokenized. Every field has a name which must be a symbol. There are three properties that you can set, :store, :index and :term_vector. You can also set the default :boost for a field as well.

Properties

:store

The :store property allows you to specify how a field is stored. You can leave a field unstored (:no), store it in it’s original format (:yes) or store it in compressed format (:compressed). By default the document is stored in its original format. If the field is large and it is stored elsewhere where it is easily accessible you might want to leave it unstored. This will keep the index size a lot smaller and make the indexing process a lot faster. For example, you should probably leave the :content field unstored when indexing all the documents in your file-system.

:index

The :index property allows you to specify how a field is indexed. A field must be indexed to be searchable. However, a field doesn’t need to be indexed to be store in the Ferret index. You may want to use the index as a simple database and store things like images or MP3s in the index. By default each field is indexed and tokenized (split into tokens) (:yes). If you don’t want to index the field use :no. If you want the field indexed but not tokenized, use :untokenized. Do this for the fields you wish to sort by. There are two other values for :index; :omit_norms and :untokenized_omit_norms. These values correspond to :yes and :untokenized respectively and are useful if you are not boosting any fields and you’d like to speed up the index. The norms file is the file which contains the boost values for each document for a particular field.

:term_vector

See TermVector for a description of term-vectors. You can specify whether or not you would like to store term-vectors. The available options are :no, :yes, :with_positions, :with_offsets and :with_positions_offsets. Note that you need to store the positions to associate offsets with individual terms in the term_vector.

Property Table

Property       Value                     Description
------------------------------------------------------------------------
 :store       | :no                     | Don't store field
              |                         |
              | :yes (default)          | Store field in its original
              |                         | format. Use this value if you
              |                         | want to highlight matches.
              |                         | or print match excerpts a la
              |                         | Google search.
              |                         |
              | :compressed             | Store field in compressed
              |                         | format.
 -------------|-------------------------|------------------------------
 :index       | :no                     | Do not make this field
              |                         | searchable.
              |                         |
              | :yes (default)          | Make this field searchable and
              |                         | tokenized its contents.
              |                         |
              | :untokenized            | Make this field searchable but
              |                         | do not tokenize its contents.
              |                         | use this value for fields you
              |                         | wish to sort by.
              |                         |
              | :omit_norms             | Same as :yes except omit the
              |                         | norms file. The norms file can
              |                         | be omitted if you don't boost
              |                         | any fields and you don't need
              |                         | scoring based on field length.
              |                         |
              | :untokenized_omit_norms | Same as :untokenized except omit
              |                         | the norms file. Norms files can
              |                         | be omitted if you don't boost
              |                         | any fields and you don't need
              |                         | scoring based on field length.
              |                         |
 -------------|-------------------------|------------------------------
 :term_vector | :no                     | Don't store term-vectors
              |                         |
              | :yes                    | Store term-vectors without
              |                         | storing positions or offsets.
              |                         |
              | :with_positions         | Store term-vectors with
              |                         | positions.
              |                         |
              | :with_offsets           | Store term-vectors with
              |                         | offsets.
              |                         |
              | :with_positions_offsets | Store term-vectors with
              | (default)               | positions and offsets.
 -------------|-------------------------|------------------------------
 :boost       | Float                   | The boost property is used to
              |                         | set the default boost for a
              |                         | field. This boost value will
              |                         | used for all instances of the
              |                         | field in the index unless
              |                         | otherwise specified when you
              |                         | create the field. All values
              |                         | should be positive.
              |                         |

Examples

fi = FieldInfo.new(:title, :index => :untokenized, :term_vector => :no,
                   :boost => 10.0)

fi = FieldInfo.new(:content)

fi = FieldInfo.new(:created_on, :index => :untokenized_omit_norms,
                   :term_vector => :no)

fi = FieldInfo.new(:image, :store => :compressed, :index => :no,
                   :term_vector => :no)