Class: Ferret::Analysis::Token

Inherits:
Object
  • Object
show all
Defined in:
ext/r_analysis.c

Overview

Summary

A Token is an occurrence of a term from the text of a field. It consists of a term’s text and the start and end offset of the term in the text of the field;

The start and end offsets permit applications to re-associate a token with its source text, e.g., to display highlighted query terms in a document browser, or to show matching text fragments in a KWIC (KeyWord In Context) display, etc.

Attributes

text

the terms text which may have been modified by a Token Filter or Tokenizer from the text originally found in the document

start

is the position of the first character corresponding to this token in the source text

end

is equal to one greater than the position of the last character corresponding of this token Note that the difference between @end_offset and @start_offset may not be equal to @text.length(), as the term text may have been altered by a stemmer or some other filter.