Class: Ferret::Document::Field

Inherits:
Object
  • Object
show all
Defined in:
lib/ferret/document/field.rb

Overview

A field is a section of a Document. Each field has two parts, a name and a value. Values may be free text, provided as a String or as a Reader, or they may be atomic keywords, which are not further processed. Such keywords may be used to represent dates, urls, etc. Fields are optionally stored in the index, so that they may be returned with hits on the document.

Defined Under Namespace

Classes: Index, Store, TermVector

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(name, value, stored = Store::YES, index = Index::UNTOKENIZED, store_term_vector = TermVector::NO, binary = false, boost = 1.0) ⇒ Field

Create a field by specifying its name, value and how it will be saved in the index.

name

The name of the field

value

The string to process

store

Whether value should be stored in the index

index

Whether the field should be indexed, and if so, if it should be tokenized before indexing

store_term_vector

Whether term vector should be stored

* the field is neither stored nor indexed
* the field is not indexed but term_vector is _TermVector::YES_
binary

Whether you want to store binary data in this field. Default is

false
boost

the boost for this field. Default is 1.0. A larger number makes

this field more important.



148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
# File 'lib/ferret/document/field.rb', line 148

def initialize(name,
               value,
               stored = Store::YES,
               index = Index::UNTOKENIZED,
               store_term_vector = TermVector::NO,
               binary = false,
               boost = 1.0)
  if (index == Index::NO and stored == Store::NO)
    raise ArgumentError, "it doesn't make sense to have a field that " +
      "is neither indexed nor stored"
  end
  if (index == Index::NO && store_term_vector != TermVector::NO)
    raise ArgumentError, "cannot store term vector information for a " +
      "field that is not indexed"
  end

  # The name of the field (e.g., "date", "subject", "title", or "body")
  @name = name

  # the one and only data object for all different kind of field values
  @data = value
  self.stored = stored
  self.index = index
  self.store_term_vector = store_term_vector
  @binary = binary
  @boost = boost
end

Instance Attribute Details

#boostObject

This value will be multiplied into the score of all hits on this field of this document.

The boost is multiplied by Document#boost of the document containing this field. If a document has multiple fields with the same name, all such values are multiplied together. This product is then multipled by the value Similarity#length_norm(String,int), and rounded by Similarity#encode_norm(float) before it is stored in the index. One should attempt to ensure that this product does not overflow the range of that encoding.

See Document#set_boost(float) See Similarity#length_norm(String, int) See Similarity#encode_norm(float)

Note: this value is not stored directly with the document in the index. Documents returned from IndexReader#document(int) and Hits#doc(int) may thus not have the same value present as when this field was indexed.



30
31
32
# File 'lib/ferret/document/field.rb', line 30

def boost
  @boost
end

#dataObject

This value will be multiplied into the score of all hits on this field of this document.

The boost is multiplied by Document#boost of the document containing this field. If a document has multiple fields with the same name, all such values are multiplied together. This product is then multipled by the value Similarity#length_norm(String,int), and rounded by Similarity#encode_norm(float) before it is stored in the index. One should attempt to ensure that this product does not overflow the range of that encoding.

See Document#set_boost(float) See Similarity#length_norm(String, int) See Similarity#encode_norm(float)

Note: this value is not stored directly with the document in the index. Documents returned from IndexReader#document(int) and Hits#doc(int) may thus not have the same value present as when this field was indexed.



30
31
32
# File 'lib/ferret/document/field.rb', line 30

def data
  @data
end

#nameObject (readonly)

Returns the value of attribute name.



32
33
34
# File 'lib/ferret/document/field.rb', line 32

def name
  @name
end

Class Method Details

.new_binary_field(name, value, stored) ⇒ Object

Create a stored field with binary value. Optionally the value may be compressed. But it obviously won’t be tokenized or term vectored or anything like that.

name

The name of the field

value

The binary value

store

How value should be stored (compressed or not.)



268
269
270
271
272
273
# File 'lib/ferret/document/field.rb', line 268

def Field.new_binary_field(name, value, stored)
  if (stored == Store::NO)
    raise ArgumentError, "binary values can't be unstored"
  end
  Field.new(name, value, stored, Index::NO, TermVector::NO, true)
end

Instance Method Details

#binary?Boolean

True if the field is to be stored as a binary value. This can be used to store images or other binary data in the index if you wish

Returns:

  • (Boolean)


50
# File 'lib/ferret/document/field.rb', line 50

def binary?() return @binary end

#binary_valueObject

if the data is stored as a binary, just return it.



245
246
247
# File 'lib/ferret/document/field.rb', line 245

def binary_value
  return @data
end

#compressed?Boolean

True if you want to compress the data that you store. This is a good idea for really large text fields. The ruby Zlib library is used to do the compression

Returns:

  • (Boolean)


55
# File 'lib/ferret/document/field.rb', line 55

def compressed?() return @compressed end

#index=(index) ⇒ Object



191
192
193
194
195
196
197
198
199
200
201
202
203
204
# File 'lib/ferret/document/field.rb', line 191

def index=(index)
  if (index == Index::NO)
    @indexed = false
    @tokenized = false
  elsif (index == Index::TOKENIZED)
    @indexed = true
    @tokenized = true
  elsif (index == Index::UNTOKENIZED)
    @indexed = true
    @tokenized = false
  else
    raise "unknown stored parameter " + index.to_s
  end
end

#indexed?Boolean

True iff the value of the field is to be indexed, so that it may be searched on.

Returns:

  • (Boolean)


41
# File 'lib/ferret/document/field.rb', line 41

def indexed?() return @indexed end

#reader_valueObject

Returns the string value of the data that is stored in this field



250
251
252
253
254
255
256
257
258
259
# File 'lib/ferret/document/field.rb', line 250

def reader_value
  if @data.respond_to? :read
    return @data
  elsif @data.instance_of? String
    return Ferret::Utils::StringHelper::StringReader.new(@data)
  else
    # if it is binary object try to return a string representation
    return Ferret::Utils::StringHelper::StringReader.new(@data.to_s)
  end
end

#store_offsets?Boolean

True if the offsets of this field are stored. The offsets are the positions of the start and end characters of the token in the whole field string

Returns:

  • (Boolean)


72
# File 'lib/ferret/document/field.rb', line 72

def store_offsets?() return @store_offset end

#store_positions?Boolean

True if the positions of the indexed terms in this field are stored.

Returns:

  • (Boolean)


67
# File 'lib/ferret/document/field.rb', line 67

def store_positions?() return @store_position end

#store_term_vector=(store_term_vector) ⇒ Object



206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
# File 'lib/ferret/document/field.rb', line 206

def store_term_vector=(store_term_vector)
  if (store_term_vector == TermVector::NO)
    @store_term_vector = false
    @store_position = false
    @store_offset = false
  elsif (store_term_vector == TermVector::YES)
    @store_term_vector = true
    @store_position = false
    @store_offset = false
  elsif (store_term_vector == TermVector::WITH_POSITIONS)
    @store_term_vector = true
    @store_position = true
    @store_offset = false
  elsif (store_term_vector == TermVector::WITH_OFFSETS)
    @store_term_vector = true
    @store_position = false
    @store_offset = true
  elsif (store_term_vector == TermVector::WITH_POSITIONS_OFFSETS)
    @store_term_vector = true
    @store_position = true
    @store_offset = true
  else
    raise "unknown term_vector parameter " + store_term_vector.to_s
  end
end

#store_term_vector?Boolean

True iff the term or terms used to index this field are stored as a term vector, available from IndexReader#term_freq_vector(). These methods do not provide access to the original content of the field, only to terms used to index it. If the original content must be preserved, use the stored attribute instead.

See IndexReader#term_freq_vector()

Returns:

  • (Boolean)


64
# File 'lib/ferret/document/field.rb', line 64

def store_term_vector?() return @store_term_vector end

#stored=(stored) ⇒ Object



176
177
178
179
180
181
182
183
184
185
186
187
188
189
# File 'lib/ferret/document/field.rb', line 176

def stored=(stored)
  if (stored == Store::YES)
    @stored = true
    @compressed = false
  elsif (stored == Store::COMPRESS)
    @stored = true
    @compressed = true
  elsif (stored == Store::NO)
    @stored = false
    @compressed = false
  else
    raise "unknown stored parameter " + stored.to_s
  end
end

#stored?Boolean

True iff the value of the field is to be stored in the index for return with search hits. It is an error for this to be true if a field is Reader-valued.

Returns:

  • (Boolean)


37
# File 'lib/ferret/document/field.rb', line 37

def stored?() return @stored end

#string_valueObject

Returns the string value of the data that is stored in this field



233
234
235
236
237
238
239
240
241
242
# File 'lib/ferret/document/field.rb', line 233

def string_value
  if @data.instance_of? String
    return @data
  elsif @data.respond_to? :read
    return @data.read()
  else
    # if it is binary object try to return a string representation
    return @data.to_s
  end
end

#to_sObject

Prints a Field for human consumption.



276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
# File 'lib/ferret/document/field.rb', line 276

def to_s()
  str = ""
  if (@stored)
    str << "stored"
    @str << @compressed ? "/compressed," : "/uncompressed,"
  end
  if (@indexed) then str << "indexed," end
  if (@tokenized) then str << "tokenized," end
  if (@store_term_vector) then str << "store_term_vector," end
  if (@store_offset)
    str << "term_vector_offsets,"
  end
  if (@store_position)
    str << "term_vector_position,"
  end 
  if (@binary) then str << "binary," end

  str << '<'
  str << @name
  str << ':'

  if (@data != null)
    str << @data.to_s
  end

  str << '>'
end

#tokenized?Boolean

True iff the value of the field should be tokenized as text prior to indexing. Un-tokenized fields are indexed as a single word and may not be Reader-valued.

Returns:

  • (Boolean)


46
# File 'lib/ferret/document/field.rb', line 46

def tokenized?() return @tokenized end