Class: Ferret::Document::Field
- Inherits:
-
Object
- Object
- Ferret::Document::Field
- Defined in:
- lib/ferret/document/field.rb
Overview
A field is a section of a Document. Each field has two parts, a name and a value. Values may be free text, provided as a String or as a Reader, or they may be atomic keywords, which are not further processed. Such keywords may be used to represent dates, urls, etc. Fields are optionally stored in the index, so that they may be returned with hits on the document.
Defined Under Namespace
Classes: Index, Store, TermVector
Instance Attribute Summary collapse
-
#boost ⇒ Object
This value will be multiplied into the score of all hits on this field of this document.
-
#data ⇒ Object
This value will be multiplied into the score of all hits on this field of this document.
-
#name ⇒ Object
readonly
Returns the value of attribute name.
Class Method Summary collapse
-
.new_binary_field(name, value, stored) ⇒ Object
Create a stored field with binary value.
Instance Method Summary collapse
-
#binary? ⇒ Boolean
True if the field is to be stored as a binary value.
-
#binary_value ⇒ Object
if the data is stored as a binary, just return it.
-
#compressed? ⇒ Boolean
True if you want to compress the data that you store.
- #index=(index) ⇒ Object
-
#indexed? ⇒ Boolean
True iff the value of the field is to be indexed, so that it may be searched on.
-
#initialize(name, value, stored = Store::YES, index = Index::UNTOKENIZED, store_term_vector = TermVector::NO, binary = false, boost = 1.0) ⇒ Field
constructor
Create a field by specifying its name, value and how it will be saved in the index.
-
#omit_norms? ⇒ Boolean
True if the norms are not stored for this field.
-
#reader_value ⇒ Object
Returns the string value of the data that is stored in this field.
-
#store_offsets? ⇒ Boolean
True if the offsets of this field are stored.
-
#store_positions? ⇒ Boolean
True if the positions of the indexed terms in this field are stored.
- #store_term_vector=(store_term_vector) ⇒ Object
-
#store_term_vector? ⇒ Boolean
True iff the term or terms used to index this field are stored as a term vector, available from IndexReader#term_freq_vector().
- #stored=(stored) ⇒ Object
-
#stored? ⇒ Boolean
True iff the value of the field is to be stored in the index for return with search hits.
-
#string_value ⇒ Object
Returns the string value of the data that is stored in this field.
-
#to_s ⇒ Object
Prints a Field for human consumption.
-
#tokenized? ⇒ Boolean
True iff the value of the field should be tokenized as text prior to indexing.
Constructor Details
#initialize(name, value, stored = Store::YES, index = Index::UNTOKENIZED, store_term_vector = TermVector::NO, binary = false, boost = 1.0) ⇒ Field
Create a field by specifying its name, value and how it will be saved in the index.
- name
-
The name of the field
- value
-
The string to process
- store
-
Whether value should be stored in the index
- index
-
Whether the field should be indexed, and if so, if it should be tokenized before indexing
- store_term_vector
-
Whether term vector should be stored
* the field is neither stored nor indexed
* the field is not indexed but term_vector is _TermVector::YES_
- binary
-
Whether you want to store binary data in this field. Default is
false
- boost
-
the boost for this field. Default is 1.0. A larger number makes
this field more important.
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
# File 'lib/ferret/document/field.rb', line 161 def initialize(name, value, stored = Store::YES, index = Index::UNTOKENIZED, store_term_vector = TermVector::NO, binary = false, boost = 1.0) if (index == Index::NO and stored == Store::NO) raise ArgumentError, "it doesn't make sense to have a field that " + "is neither indexed nor stored" end if (index == Index::NO && store_term_vector != TermVector::NO) raise ArgumentError, "cannot store term vector information for a " + "field that is not indexed" end # The name of the field (e.g., "date", "subject", "title", or "body") @name = name # the one and only data object for all different kind of field values @data = value self.stored = stored self.index = index self.store_term_vector = store_term_vector @binary = binary @boost = boost end |
Instance Attribute Details
#boost ⇒ Object
This value will be multiplied into the score of all hits on this field of this document.
The boost is multiplied by Document#boost of the document containing this field. If a document has multiple fields with the same name, all such values are multiplied together. This product is then multipled by the value Similarity#length_norm(String,int), and rounded by Similarity#encode_norm(float) before it is stored in the index. One should attempt to ensure that this product does not overflow the range of that encoding.
See Document#set_boost(float) See Similarity#length_norm(String, int) See Similarity#encode_norm(float)
Note: this value is not stored directly with the document in the index. Documents returned from IndexReader#document(int) and Hits#doc(int) may thus not have the same value present as when this field was indexed.
30 31 32 |
# File 'lib/ferret/document/field.rb', line 30 def boost @boost end |
#data ⇒ Object
This value will be multiplied into the score of all hits on this field of this document.
The boost is multiplied by Document#boost of the document containing this field. If a document has multiple fields with the same name, all such values are multiplied together. This product is then multipled by the value Similarity#length_norm(String,int), and rounded by Similarity#encode_norm(float) before it is stored in the index. One should attempt to ensure that this product does not overflow the range of that encoding.
See Document#set_boost(float) See Similarity#length_norm(String, int) See Similarity#encode_norm(float)
Note: this value is not stored directly with the document in the index. Documents returned from IndexReader#document(int) and Hits#doc(int) may thus not have the same value present as when this field was indexed.
30 31 32 |
# File 'lib/ferret/document/field.rb', line 30 def data @data end |
#name ⇒ Object (readonly)
Returns the value of attribute name.
32 33 34 |
# File 'lib/ferret/document/field.rb', line 32 def name @name end |
Class Method Details
.new_binary_field(name, value, stored) ⇒ Object
Create a stored field with binary value. Optionally the value may be compressed. But it obviously won’t be tokenized or term vectored or anything like that.
- name
-
The name of the field
- value
-
The binary value
- store
-
How value should be stored (compressed or not.)
289 290 291 292 293 294 |
# File 'lib/ferret/document/field.rb', line 289 def Field.new_binary_field(name, value, stored) if (stored == Store::NO) raise ArgumentError, "binary values can't be unstored" end Field.new(name, value, stored, Index::NO, TermVector::NO, true) end |
Instance Method Details
#binary? ⇒ Boolean
True if the field is to be stored as a binary value. This can be used to store images or other binary data in the index if you wish
50 |
# File 'lib/ferret/document/field.rb', line 50 def binary?() return @binary end |
#binary_value ⇒ Object
if the data is stored as a binary, just return it.
266 267 268 |
# File 'lib/ferret/document/field.rb', line 266 def binary_value return @data end |
#compressed? ⇒ Boolean
True if you want to compress the data that you store. This is a good idea for really large text fields. The ruby Zlib library is used to do the compression
55 |
# File 'lib/ferret/document/field.rb', line 55 def compressed?() return @compressed end |
#index=(index) ⇒ Object
205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 |
# File 'lib/ferret/document/field.rb', line 205 def index=(index) @omit_norms = false case index when Index::NO @indexed = false @tokenized = false when Index::TOKENIZED @indexed = true @tokenized = true when Index::UNTOKENIZED @indexed = true @tokenized = false when Index::NO_NORMS @indexed = true @tokenized = false @omit_norms = true else raise "unknown stored parameter " + index.to_s end end |
#indexed? ⇒ Boolean
True iff the value of the field is to be indexed, so that it may be searched on.
41 |
# File 'lib/ferret/document/field.rb', line 41 def indexed?() return @indexed end |
#omit_norms? ⇒ Boolean
True if the norms are not stored for this field. No norms means that index-time boosting and field length normalization will be disabled. The benefit is less memory usage as norms take up one byte per indexed field for every document in the index.
78 |
# File 'lib/ferret/document/field.rb', line 78 def omit_norms?() return @omit_norms end |
#reader_value ⇒ Object
Returns the string value of the data that is stored in this field
271 272 273 274 275 276 277 278 279 280 |
# File 'lib/ferret/document/field.rb', line 271 def reader_value if @data.respond_to? :read return @data elsif @data.instance_of? String return Ferret::Utils::StringHelper::StringReader.new(@data) else # if it is binary object try to return a string representation return Ferret::Utils::StringHelper::StringReader.new(@data.to_s) end end |
#store_offsets? ⇒ Boolean
True if the offsets of this field are stored. The offsets are the positions of the start and end characters of the token in the whole field string
72 |
# File 'lib/ferret/document/field.rb', line 72 def store_offsets?() return @store_offset end |
#store_positions? ⇒ Boolean
True if the positions of the indexed terms in this field are stored.
67 |
# File 'lib/ferret/document/field.rb', line 67 def store_positions?() return @store_position end |
#store_term_vector=(store_term_vector) ⇒ Object
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 |
# File 'lib/ferret/document/field.rb', line 226 def store_term_vector=(store_term_vector) case store_term_vector when TermVector::NO @store_term_vector = false @store_position = false @store_offset = false when TermVector::YES @store_term_vector = true @store_position = false @store_offset = false when TermVector::WITH_POSITIONS @store_term_vector = true @store_position = true @store_offset = false when TermVector::WITH_OFFSETS @store_term_vector = true @store_position = false @store_offset = true when TermVector::WITH_POSITIONS_OFFSETS @store_term_vector = true @store_position = true @store_offset = true else raise "unknown term_vector parameter " + store_term_vector.to_s end end |
#store_term_vector? ⇒ Boolean
True iff the term or terms used to index this field are stored as a term vector, available from IndexReader#term_freq_vector(). These methods do not provide access to the original content of the field, only to terms used to index it. If the original content must be preserved, use the stored attribute instead.
See IndexReader#term_freq_vector()
64 |
# File 'lib/ferret/document/field.rb', line 64 def store_term_vector?() return @store_term_vector end |
#stored=(stored) ⇒ Object
189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
# File 'lib/ferret/document/field.rb', line 189 def stored=(stored) case stored when Store::YES @stored = true @compressed = false when Store::COMPRESS @stored = true @compressed = true when Store::NO @stored = false @compressed = false else raise "unknown stored parameter " + stored.to_s end end |
#stored? ⇒ Boolean
True iff the value of the field is to be stored in the index for return with search hits. It is an error for this to be true if a field is Reader-valued.
37 |
# File 'lib/ferret/document/field.rb', line 37 def stored?() return @stored end |
#string_value ⇒ Object
Returns the string value of the data that is stored in this field
254 255 256 257 258 259 260 261 262 263 |
# File 'lib/ferret/document/field.rb', line 254 def string_value if @data.instance_of? String return @data elsif @data.respond_to? :read return @data.read() else # if it is binary object try to return a string representation return @data.to_s end end |
#to_s ⇒ Object
Prints a Field for human consumption.
297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 |
# File 'lib/ferret/document/field.rb', line 297 def to_s() str = "" if (@stored) str << "stored" str << (@compressed ? "/compressed," : "/uncompressed,") end str << "indexed," if (@indexed) str << "tokenized," if (@tokenized) str << "store_term_vector," if (@store_term_vector) str << "tv_offset," if (@store_offset) str << "tv_position," if (@store_position) str << "omit_norms," if (@omit_norms) str << "binary," if (@binary) str << "<#{@name}:#{data}>" end |
#tokenized? ⇒ Boolean
True iff the value of the field should be tokenized as text prior to indexing. Un-tokenized fields are indexed as a single word and may not be Reader-valued.
46 |
# File 'lib/ferret/document/field.rb', line 46 def tokenized?() return @tokenized end |