Class: Ferret::Document::Field

Inherits:
Object
  • Object
show all
Defined in:
lib/ferret/document/field.rb

Overview

A field is a section of a Document. Each field has two parts, a name and a value. Values may be free text, provided as a String or as a Reader, or they may be atomic keywords, which are not further processed. Such keywords may be used to represent dates, urls, etc. Fields are optionally stored in the index, so that they may be returned with hits on the document.

Defined Under Namespace

Classes: Index, Store, TermVector

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(name, value, stored = Store::YES, index = Index::UNTOKENIZED, store_term_vector = TermVector::NO, binary = false, boost = 1.0) ⇒ Field

Create a field by specifying its name, value and how it will be saved in the index.

name

The name of the field

value

The string to process

store

Whether value should be stored in the index

index

Whether the field should be indexed, and if so, if it should be tokenized before indexing

store_term_vector

Whether term vector should be stored

* the field is neither stored nor indexed
* the field is not indexed but term_vector is _TermVector::YES_
binary

Whether you want to store binary data in this field. Default is

false
boost

the boost for this field. Default is 1.0. A larger number makes

this field more important.



161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
# File 'lib/ferret/document/field.rb', line 161

def initialize(name,
               value,
               stored = Store::YES,
               index = Index::UNTOKENIZED,
               store_term_vector = TermVector::NO,
               binary = false,
               boost = 1.0)
  if (index == Index::NO and stored == Store::NO)
    raise ArgumentError, "it doesn't make sense to have a field that " +
      "is neither indexed nor stored"
  end
  if (index == Index::NO && store_term_vector != TermVector::NO)
    raise ArgumentError, "cannot store term vector information for a " +
      "field that is not indexed"
  end

  # The name of the field (e.g., "date", "subject", "title", or "body")
  @name = name

  # the one and only data object for all different kind of field values
  @data = value
  self.stored = stored
  self.index = index
  self.store_term_vector = store_term_vector
  @binary = binary
  @boost = boost
end

Instance Attribute Details

#boostObject

This value will be multiplied into the score of all hits on this field of this document.

The boost is multiplied by Document#boost of the document containing this field. If a document has multiple fields with the same name, all such values are multiplied together. This product is then multipled by the value Similarity#length_norm(String,int), and rounded by Similarity#encode_norm(float) before it is stored in the index. One should attempt to ensure that this product does not overflow the range of that encoding.

See Document#set_boost(float) See Similarity#length_norm(String, int) See Similarity#encode_norm(float)

Note: this value is not stored directly with the document in the index. Documents returned from IndexReader#document(int) and Hits#doc(int) may thus not have the same value present as when this field was indexed.



30
31
32
# File 'lib/ferret/document/field.rb', line 30

def boost
  @boost
end

#dataObject

This value will be multiplied into the score of all hits on this field of this document.

The boost is multiplied by Document#boost of the document containing this field. If a document has multiple fields with the same name, all such values are multiplied together. This product is then multipled by the value Similarity#length_norm(String,int), and rounded by Similarity#encode_norm(float) before it is stored in the index. One should attempt to ensure that this product does not overflow the range of that encoding.

See Document#set_boost(float) See Similarity#length_norm(String, int) See Similarity#encode_norm(float)

Note: this value is not stored directly with the document in the index. Documents returned from IndexReader#document(int) and Hits#doc(int) may thus not have the same value present as when this field was indexed.



30
31
32
# File 'lib/ferret/document/field.rb', line 30

def data
  @data
end

#nameObject (readonly)

Returns the value of attribute name.



32
33
34
# File 'lib/ferret/document/field.rb', line 32

def name
  @name
end

Class Method Details

.new_binary_field(name, value, stored) ⇒ Object

Create a stored field with binary value. Optionally the value may be compressed. But it obviously won’t be tokenized or term vectored or anything like that.

name

The name of the field

value

The binary value

store

How value should be stored (compressed or not.)



289
290
291
292
293
294
# File 'lib/ferret/document/field.rb', line 289

def Field.new_binary_field(name, value, stored)
  if (stored == Store::NO)
    raise ArgumentError, "binary values can't be unstored"
  end
  Field.new(name, value, stored, Index::NO, TermVector::NO, true)
end

Instance Method Details

#binary?Boolean

True if the field is to be stored as a binary value. This can be used to store images or other binary data in the index if you wish

Returns:

  • (Boolean)


50
# File 'lib/ferret/document/field.rb', line 50

def binary?() return @binary end

#binary_valueObject

if the data is stored as a binary, just return it.



266
267
268
# File 'lib/ferret/document/field.rb', line 266

def binary_value
  return @data
end

#compressed?Boolean

True if you want to compress the data that you store. This is a good idea for really large text fields. The ruby Zlib library is used to do the compression

Returns:

  • (Boolean)


55
# File 'lib/ferret/document/field.rb', line 55

def compressed?() return @compressed end

#index=(index) ⇒ Object



205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
# File 'lib/ferret/document/field.rb', line 205

def index=(index)
  @omit_norms = false
  case index
  when Index::NO
    @indexed = false
    @tokenized = false
  when Index::TOKENIZED
    @indexed = true
    @tokenized = true
  when Index::UNTOKENIZED
    @indexed = true
    @tokenized = false
  when Index::NO_NORMS
    @indexed = true
    @tokenized = false
    @omit_norms = true
  else
    raise "unknown stored parameter " + index.to_s
  end
end

#indexed?Boolean

True iff the value of the field is to be indexed, so that it may be searched on.

Returns:

  • (Boolean)


41
# File 'lib/ferret/document/field.rb', line 41

def indexed?() return @indexed end

#omit_norms?Boolean

True if the norms are not stored for this field. No norms means that index-time boosting and field length normalization will be disabled. The benefit is less memory usage as norms take up one byte per indexed field for every document in the index.

Returns:

  • (Boolean)


78
# File 'lib/ferret/document/field.rb', line 78

def omit_norms?() return @omit_norms end

#reader_valueObject

Returns the string value of the data that is stored in this field



271
272
273
274
275
276
277
278
279
280
# File 'lib/ferret/document/field.rb', line 271

def reader_value
  if @data.respond_to? :read
    return @data
  elsif @data.instance_of? String
    return Ferret::Utils::StringHelper::StringReader.new(@data)
  else
    # if it is binary object try to return a string representation
    return Ferret::Utils::StringHelper::StringReader.new(@data.to_s)
  end
end

#store_offsets?Boolean

True if the offsets of this field are stored. The offsets are the positions of the start and end characters of the token in the whole field string

Returns:

  • (Boolean)


72
# File 'lib/ferret/document/field.rb', line 72

def store_offsets?() return @store_offset end

#store_positions?Boolean

True if the positions of the indexed terms in this field are stored.

Returns:

  • (Boolean)


67
# File 'lib/ferret/document/field.rb', line 67

def store_positions?() return @store_position end

#store_term_vector=(store_term_vector) ⇒ Object



226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
# File 'lib/ferret/document/field.rb', line 226

def store_term_vector=(store_term_vector)
  case store_term_vector
  when TermVector::NO
    @store_term_vector = false
    @store_position = false
    @store_offset = false
  when TermVector::YES
    @store_term_vector = true
    @store_position = false
    @store_offset = false
  when TermVector::WITH_POSITIONS
    @store_term_vector = true
    @store_position = true
    @store_offset = false
  when TermVector::WITH_OFFSETS
    @store_term_vector = true
    @store_position = false
    @store_offset = true
  when TermVector::WITH_POSITIONS_OFFSETS
    @store_term_vector = true
    @store_position = true
    @store_offset = true
  else
    raise "unknown term_vector parameter " + store_term_vector.to_s
  end
end

#store_term_vector?Boolean

True iff the term or terms used to index this field are stored as a term vector, available from IndexReader#term_freq_vector(). These methods do not provide access to the original content of the field, only to terms used to index it. If the original content must be preserved, use the stored attribute instead.

See IndexReader#term_freq_vector()

Returns:

  • (Boolean)


64
# File 'lib/ferret/document/field.rb', line 64

def store_term_vector?() return @store_term_vector end

#stored=(stored) ⇒ Object



189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
# File 'lib/ferret/document/field.rb', line 189

def stored=(stored)
  case stored
  when Store::YES
    @stored = true
    @compressed = false
  when Store::COMPRESS
    @stored = true
    @compressed = true
  when Store::NO
    @stored = false
    @compressed = false
  else
    raise "unknown stored parameter " + stored.to_s
  end
end

#stored?Boolean

True iff the value of the field is to be stored in the index for return with search hits. It is an error for this to be true if a field is Reader-valued.

Returns:

  • (Boolean)


37
# File 'lib/ferret/document/field.rb', line 37

def stored?() return @stored end

#string_valueObject

Returns the string value of the data that is stored in this field



254
255
256
257
258
259
260
261
262
263
# File 'lib/ferret/document/field.rb', line 254

def string_value
  if @data.instance_of? String
    return @data
  elsif @data.respond_to? :read
    return @data.read()
  else
    # if it is binary object try to return a string representation
    return @data.to_s
  end
end

#to_sObject

Prints a Field for human consumption.



297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
# File 'lib/ferret/document/field.rb', line 297

def to_s()
  str = ""
  if (@stored)
    str << "stored"
    str << (@compressed ? "/compressed," : "/uncompressed,")
  end
  str << "indexed," if (@indexed)
  str << "tokenized," if (@tokenized)
  str << "store_term_vector," if (@store_term_vector)
  str << "tv_offset," if (@store_offset)
  str << "tv_position," if (@store_position)
  str << "omit_norms," if (@omit_norms)
  str << "binary," if (@binary)
  str << "<#{@name}:#{data}>"
end

#tokenized?Boolean

True iff the value of the field should be tokenized as text prior to indexing. Un-tokenized fields are indexed as a single word and may not be Reader-valued.

Returns:

  • (Boolean)


46
# File 'lib/ferret/document/field.rb', line 46

def tokenized?() return @tokenized end