Class: Ferret::Document::Field

Inherits:

Object

Object
Ferret::Document::Field

show all

Defined in:: lib/ferret/document/field.rb

Overview

A field is a section of a Document. Each field has two parts, a name and a value. Values may be free text, provided as a String or as a Reader, or they may be atomic keywords, which are not further processed. Such keywords may be used to represent dates, urls, etc. Fields are optionally stored in the index, so that they may be returned with hits on the document.

Defined Under Namespace

Classes: Index, Store, TermVector

Instance Attribute Summary collapse

#boost ⇒ Object

This value will be multiplied into the score of all hits on this field of this document.
#data ⇒ Object

This value will be multiplied into the score of all hits on this field of this document.
#name ⇒ Object readonly

Returns the value of attribute name.

Class Method Summary collapse

.new_binary_field(name, value, stored) ⇒ Object

Create a stored field with binary value.

Instance Method Summary collapse

#binary? ⇒ Boolean

True if the field is to be stored as a binary value.
#binary_value ⇒ Object

if the data is stored as a binary, just return it.
#compressed? ⇒ Boolean

True if you want to compress the data that you store.
#index=(index) ⇒ Object
#indexed? ⇒ Boolean

True iff the value of the field is to be indexed, so that it may be searched on.
#initialize(name, value, stored = Store::YES, index = Index::UNTOKENIZED, store_term_vector = TermVector::NO, binary = false, boost = 1.0) ⇒ Field constructor

Create a field by specifying its name, value and how it will be saved in the index.
#omit_norms? ⇒ Boolean

True if the norms are not stored for this field.
#reader_value ⇒ Object

Returns the string value of the data that is stored in this field.
#store_offsets? ⇒ Boolean

True if the offsets of this field are stored.
#store_positions? ⇒ Boolean

True if the positions of the indexed terms in this field are stored.
#store_term_vector=(store_term_vector) ⇒ Object
#store_term_vector? ⇒ Boolean

True iff the term or terms used to index this field are stored as a term vector, available from IndexReader#term_freq_vector().
#stored=(stored) ⇒ Object
#stored? ⇒ Boolean

True iff the value of the field is to be stored in the index for return with search hits.
#string_value ⇒ Object

Returns the string value of the data that is stored in this field.
#to_s ⇒ Object

Prints a Field for human consumption.
#tokenized? ⇒ Boolean

True iff the value of the field should be tokenized as text prior to indexing.

Constructor Details

#initialize(name, value, stored = Store::YES, index = Index::UNTOKENIZED, store_term_vector = TermVector::NO, binary = false, boost = 1.0) ⇒ `Field`

Create a field by specifying its name, value and how it will be saved in the index.

name: The name of the field
value: The string to process
store: Whether value should be stored in the index
index: Whether the field should be indexed, and if so, if it should be tokenized before indexing
store_term_vector: Whether term vector should be stored

* the field is neither stored nor indexed
* the field is not indexed but term_vector is _TermVector::YES_

binary: Whether you want to store binary data in this field. Default is

false

boost: the boost for this field. Default is 1.0. A larger number makes

this field more important.

# File 'lib/ferret/document/field.rb', line 161

def initialize(name,
               value,
               stored = Store::YES,
               index = Index::UNTOKENIZED,
               store_term_vector = TermVector::NO,
               binary = false,
               boost = 1.0)
  if (index == Index::NO and stored == Store::NO)
    raise ArgumentError, "it doesn't make sense to have a field that " +
      "is neither indexed nor stored"
  end
  if (index == Index::NO && store_term_vector != TermVector::NO)
    raise ArgumentError, "cannot store term vector information for a " +
      "field that is not indexed"
  end

  # The name of the field (e.g., "date", "subject", "title", or "body")
  @name = name

  # the one and only data object for all different kind of field values
  @data = value
  self.stored = stored
  self.index = index
  self.store_term_vector = store_term_vector
  @binary = binary
  @boost = boost
end

Instance Attribute Details

#boost ⇒ `Object`

This value will be multiplied into the score of all hits on this field of this document.

The boost is multiplied by Document#boost of the document containing this field. If a document has multiple fields with the same name, all such values are multiplied together. This product is then multipled by the value Similarity#length_norm(String,int), and rounded by Similarity#encode_norm(float) before it is stored in the index. One should attempt to ensure that this product does not overflow the range of that encoding.

See Document#set_boost(float) See Similarity#length_norm(String, int) See Similarity#encode_norm(float)

Note: this value is not stored directly with the document in the index. Documents returned from IndexReader#document(int) and Hits#doc(int) may thus not have the same value present as when this field was indexed.



30
31
32

# File 'lib/ferret/document/field.rb', line 30

def boost
  @boost
end

#data ⇒ `Object`

This value will be multiplied into the score of all hits on this field of this document.

See Document#set_boost(float) See Similarity#length_norm(String, int) See Similarity#encode_norm(float)



30
31
32

# File 'lib/ferret/document/field.rb', line 30

def data
  @data
end

#name ⇒ `Object` (readonly)

Returns the value of attribute name.



32
33
34

# File 'lib/ferret/document/field.rb', line 32

def name
  @name
end

Class Method Details

.new_binary_field(name, value, stored) ⇒ `Object`

Create a stored field with binary value. Optionally the value may be compressed. But it obviously won’t be tokenized or term vectored or anything like that.

name: The name of the field
value: The binary value
store: How value should be stored (compressed or not.)

# File 'lib/ferret/document/field.rb', line 289

def Field.new_binary_field(name, value, stored)
  if (stored == Store::NO)
    raise ArgumentError, "binary values can't be unstored"
  end
  Field.new(name, value, stored, Index::NO, TermVector::NO, true)
end

Instance Method Details

#binary? ⇒ `Boolean`

True if the field is to be stored as a binary value. This can be used to store images or other binary data in the index if you wish

Returns:

(Boolean)

50	# File 'lib/ferret/document/field.rb', line 50 def binary?() return @binary end

#binary_value ⇒ `Object`

if the data is stored as a binary, just return it.



266
267
268

# File 'lib/ferret/document/field.rb', line 266

def binary_value
  return @data
end

#compressed? ⇒ `Boolean`

True if you want to compress the data that you store. This is a good idea for really large text fields. The ruby Zlib library is used to do the compression

Returns:

(Boolean)

55	# File 'lib/ferret/document/field.rb', line 55 def compressed?() return @compressed end

#index=(index) ⇒ `Object`

# File 'lib/ferret/document/field.rb', line 205

def index=(index)
  @omit_norms = false
  case index
  when Index::NO
    @indexed = false
    @tokenized = false
  when Index::TOKENIZED
    @indexed = true
    @tokenized = true
  when Index::UNTOKENIZED
    @indexed = true
    @tokenized = false
  when Index::NO_NORMS
    @indexed = true
    @tokenized = false
    @omit_norms = true
  else
    raise "unknown stored parameter " + index.to_s
  end
end

#indexed? ⇒ `Boolean`

True iff the value of the field is to be indexed, so that it may be searched on.

Returns:

(Boolean)

41	# File 'lib/ferret/document/field.rb', line 41 def indexed?() return @indexed end

#omit_norms? ⇒ `Boolean`

True if the norms are not stored for this field. No norms means that index-time boosting and field length normalization will be disabled. The benefit is less memory usage as norms take up one byte per indexed field for every document in the index.

Returns:

(Boolean)

78	# File 'lib/ferret/document/field.rb', line 78 def omit_norms?() return @omit_norms end

#reader_value ⇒ `Object`

Returns the string value of the data that is stored in this field

# File 'lib/ferret/document/field.rb', line 271

def reader_value
  if @data.respond_to? :read
    return @data
  elsif @data.instance_of? String
    return Ferret::Utils::StringHelper::StringReader.new(@data)
  else
    # if it is binary object try to return a string representation
    return Ferret::Utils::StringHelper::StringReader.new(@data.to_s)
  end
end

#store_offsets? ⇒ `Boolean`

True if the offsets of this field are stored. The offsets are the positions of the start and end characters of the token in the whole field string

Returns:

(Boolean)

72	# File 'lib/ferret/document/field.rb', line 72 def store_offsets?() return @store_offset end

#store_positions? ⇒ `Boolean`

True if the positions of the indexed terms in this field are stored.

Returns:

(Boolean)

67	# File 'lib/ferret/document/field.rb', line 67 def store_positions?() return @store_position end

#store_term_vector=(store_term_vector) ⇒ `Object`

# File 'lib/ferret/document/field.rb', line 226

def store_term_vector=(store_term_vector)
  case store_term_vector
  when TermVector::NO
    @store_term_vector = false
    @store_position = false
    @store_offset = false
  when TermVector::YES
    @store_term_vector = true
    @store_position = false
    @store_offset = false
  when TermVector::WITH_POSITIONS
    @store_term_vector = true
    @store_position = true
    @store_offset = false
  when TermVector::WITH_OFFSETS
    @store_term_vector = true
    @store_position = false
    @store_offset = true
  when TermVector::WITH_POSITIONS_OFFSETS
    @store_term_vector = true
    @store_position = true
    @store_offset = true
  else
    raise "unknown term_vector parameter " + store_term_vector.to_s
  end
end

#store_term_vector? ⇒ `Boolean`

True iff the term or terms used to index this field are stored as a term vector, available from IndexReader#term_freq_vector(). These methods do not provide access to the original content of the field, only to terms used to index it. If the original content must be preserved, use the stored attribute instead.

See IndexReader#term_freq_vector()

Returns:

(Boolean)

64	# File 'lib/ferret/document/field.rb', line 64 def store_term_vector?() return @store_term_vector end

#stored=(stored) ⇒ `Object`

# File 'lib/ferret/document/field.rb', line 189

def stored=(stored)
  case stored
  when Store::YES
    @stored = true
    @compressed = false
  when Store::COMPRESS
    @stored = true
    @compressed = true
  when Store::NO
    @stored = false
    @compressed = false
  else
    raise "unknown stored parameter " + stored.to_s
  end
end

#stored? ⇒ `Boolean`

True iff the value of the field is to be stored in the index for return with search hits. It is an error for this to be true if a field is Reader-valued.

Returns:

(Boolean)

37	# File 'lib/ferret/document/field.rb', line 37 def stored?() return @stored end

#string_value ⇒ `Object`

Returns the string value of the data that is stored in this field

# File 'lib/ferret/document/field.rb', line 254

def string_value
  if @data.instance_of? String
    return @data
  elsif @data.respond_to? :read
    return @data.read()
  else
    # if it is binary object try to return a string representation
    return @data.to_s
  end
end

#to_s ⇒ `Object`

Prints a Field for human consumption.

# File 'lib/ferret/document/field.rb', line 297

def to_s()
  str = ""
  if (@stored)
    str << "stored"
    str << (@compressed ? "/compressed," : "/uncompressed,")
  end
  str << "indexed," if (@indexed)
  str << "tokenized," if (@tokenized)
  str << "store_term_vector," if (@store_term_vector)
  str << "tv_offset," if (@store_offset)
  str << "tv_position," if (@store_position)
  str << "omit_norms," if (@omit_norms)
  str << "binary," if (@binary)
  str << "<#{@name}:#{data}>"
end

#tokenized? ⇒ `Boolean`

True iff the value of the field should be tokenized as text prior to indexing. Un-tokenized fields are indexed as a single word and may not be Reader-valued.

Returns:

(Boolean)

46	# File 'lib/ferret/document/field.rb', line 46 def tokenized?() return @tokenized end

Class: Ferret::Document::Field

Overview

Defined Under Namespace

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(name, value, stored = Store::YES, index = Index::UNTOKENIZED, store_term_vector = TermVector::NO, binary = false, boost = 1.0) ⇒ Field

Instance Attribute Details

#boost ⇒ Object

#data ⇒ Object

#name ⇒ Object (readonly)

Class Method Details

.new_binary_field(name, value, stored) ⇒ Object

Instance Method Details

#binary? ⇒ Boolean

#binary_value ⇒ Object

#compressed? ⇒ Boolean

#index=(index) ⇒ Object

#indexed? ⇒ Boolean

#omit_norms? ⇒ Boolean

#reader_value ⇒ Object

#store_offsets? ⇒ Boolean

#store_positions? ⇒ Boolean

#store_term_vector=(store_term_vector) ⇒ Object

#store_term_vector? ⇒ Boolean

#stored=(stored) ⇒ Object

#stored? ⇒ Boolean

#string_value ⇒ Object

#to_s ⇒ Object

#tokenized? ⇒ Boolean

#initialize(name, value, stored = Store::YES, index = Index::UNTOKENIZED, store_term_vector = TermVector::NO, binary = false, boost = 1.0) ⇒ `Field`

#boost ⇒ `Object`

#data ⇒ `Object`

#name ⇒ `Object` (readonly)

.new_binary_field(name, value, stored) ⇒ `Object`

#binary? ⇒ `Boolean`

#binary_value ⇒ `Object`

#compressed? ⇒ `Boolean`

#index=(index) ⇒ `Object`

#indexed? ⇒ `Boolean`

#omit_norms? ⇒ `Boolean`

#reader_value ⇒ `Object`

#store_offsets? ⇒ `Boolean`

#store_positions? ⇒ `Boolean`

#store_term_vector=(store_term_vector) ⇒ `Object`

#store_term_vector? ⇒ `Boolean`

#stored=(stored) ⇒ `Object`

#stored? ⇒ `Boolean`

#string_value ⇒ `Object`

#to_s ⇒ `Object`

#tokenized? ⇒ `Boolean`