Class: Ferret::Document::Field
- Inherits:
-
Object
- Object
- Ferret::Document::Field
- Defined in:
- lib/ferret/document/field.rb
Overview
A field is a section of a Document. Each field has two parts, a name and a value. Values may be free text, provided as a String or as a Reader, or they may be atomic keywords, which are not further processed. Such keywords may be used to represent dates, urls, etc. Fields are optionally stored in the index, so that they may be returned with hits on the document.
Defined Under Namespace
Classes: Index, Store, TermVector
Instance Attribute Summary collapse
-
#boost ⇒ Object
This value will be multiplied into the score of all hits on this field of this document.
-
#data ⇒ Object
This value will be multiplied into the score of all hits on this field of this document.
-
#name ⇒ Object
readonly
Returns the value of attribute name.
Class Method Summary collapse
-
.new_binary_field(name, value, stored) ⇒ Object
Create a stored field with binary value.
Instance Method Summary collapse
-
#binary? ⇒ Boolean
True if the field is to be stored as a binary value.
-
#binary_value ⇒ Object
if the data is stored as a binary, just return it.
-
#compressed? ⇒ Boolean
True if you want to compress the data that you store.
- #index=(index) ⇒ Object
-
#indexed? ⇒ Boolean
True iff the value of the field is to be indexed, so that it may be searched on.
-
#initialize(name, value, stored = Store::YES, index = Index::UNTOKENIZED, store_term_vector = TermVector::NO, binary = false, boost = 1.0) ⇒ Field
constructor
Create a field by specifying its name, value and how it will be saved in the index.
-
#reader_value ⇒ Object
Returns the string value of the data that is stored in this field.
-
#store_offsets? ⇒ Boolean
True if the offsets of this field are stored.
-
#store_positions? ⇒ Boolean
True if the positions of the indexed terms in this field are stored.
- #store_term_vector=(store_term_vector) ⇒ Object
-
#store_term_vector? ⇒ Boolean
True iff the term or terms used to index this field are stored as a term vector, available from IndexReader#term_freq_vector().
- #stored=(stored) ⇒ Object
-
#stored? ⇒ Boolean
True iff the value of the field is to be stored in the index for return with search hits.
-
#string_value ⇒ Object
Returns the string value of the data that is stored in this field.
-
#to_s ⇒ Object
Prints a Field for human consumption.
-
#tokenized? ⇒ Boolean
True iff the value of the field should be tokenized as text prior to indexing.
Constructor Details
#initialize(name, value, stored = Store::YES, index = Index::UNTOKENIZED, store_term_vector = TermVector::NO, binary = false, boost = 1.0) ⇒ Field
Create a field by specifying its name, value and how it will be saved in the index.
- name
-
The name of the field
- value
-
The string to process
- store
-
Whether value should be stored in the index
- index
-
Whether the field should be indexed, and if so, if it should be tokenized before indexing
- store_term_vector
-
Whether term vector should be stored
* the field is neither stored nor indexed
* the field is not indexed but term_vector is _TermVector::YES_
- binary
-
Whether you want to store binary data in this field. Default is
false
- boost
-
the boost for this field. Default is 1.0. A larger number makes
this field more important.
148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
# File 'lib/ferret/document/field.rb', line 148 def initialize(name, value, stored = Store::YES, index = Index::UNTOKENIZED, store_term_vector = TermVector::NO, binary = false, boost = 1.0) if (index == Index::NO and stored == Store::NO) raise ArgumentError, "it doesn't make sense to have a field that " + "is neither indexed nor stored" end if (index == Index::NO && store_term_vector != TermVector::NO) raise ArgumentError, "cannot store term vector information for a " + "field that is not indexed" end # The name of the field (e.g., "date", "subject", "title", or "body") @name = name # the one and only data object for all different kind of field values @data = value self.stored = stored self.index = index self.store_term_vector = store_term_vector @binary = binary @boost = boost end |
Instance Attribute Details
#boost ⇒ Object
This value will be multiplied into the score of all hits on this field of this document.
The boost is multiplied by Document#boost of the document containing this field. If a document has multiple fields with the same name, all such values are multiplied together. This product is then multipled by the value Similarity#length_norm(String,int), and rounded by Similarity#encode_norm(float) before it is stored in the index. One should attempt to ensure that this product does not overflow the range of that encoding.
See Document#set_boost(float) See Similarity#length_norm(String, int) See Similarity#encode_norm(float)
Note: this value is not stored directly with the document in the index. Documents returned from IndexReader#document(int) and Hits#doc(int) may thus not have the same value present as when this field was indexed.
30 31 32 |
# File 'lib/ferret/document/field.rb', line 30 def boost @boost end |
#data ⇒ Object
This value will be multiplied into the score of all hits on this field of this document.
The boost is multiplied by Document#boost of the document containing this field. If a document has multiple fields with the same name, all such values are multiplied together. This product is then multipled by the value Similarity#length_norm(String,int), and rounded by Similarity#encode_norm(float) before it is stored in the index. One should attempt to ensure that this product does not overflow the range of that encoding.
See Document#set_boost(float) See Similarity#length_norm(String, int) See Similarity#encode_norm(float)
Note: this value is not stored directly with the document in the index. Documents returned from IndexReader#document(int) and Hits#doc(int) may thus not have the same value present as when this field was indexed.
30 31 32 |
# File 'lib/ferret/document/field.rb', line 30 def data @data end |
#name ⇒ Object (readonly)
Returns the value of attribute name.
32 33 34 |
# File 'lib/ferret/document/field.rb', line 32 def name @name end |
Class Method Details
.new_binary_field(name, value, stored) ⇒ Object
Create a stored field with binary value. Optionally the value may be compressed. But it obviously won’t be tokenized or term vectored or anything like that.
- name
-
The name of the field
- value
-
The binary value
- store
-
How value should be stored (compressed or not.)
268 269 270 271 272 273 |
# File 'lib/ferret/document/field.rb', line 268 def Field.new_binary_field(name, value, stored) if (stored == Store::NO) raise ArgumentError, "binary values can't be unstored" end Field.new(name, value, stored, Index::NO, TermVector::NO, true) end |
Instance Method Details
#binary? ⇒ Boolean
True if the field is to be stored as a binary value. This can be used to store images or other binary data in the index if you wish
50 |
# File 'lib/ferret/document/field.rb', line 50 def binary?() return @binary end |
#binary_value ⇒ Object
if the data is stored as a binary, just return it.
245 246 247 |
# File 'lib/ferret/document/field.rb', line 245 def binary_value return @data end |
#compressed? ⇒ Boolean
True if you want to compress the data that you store. This is a good idea for really large text fields. The ruby Zlib library is used to do the compression
55 |
# File 'lib/ferret/document/field.rb', line 55 def compressed?() return @compressed end |
#index=(index) ⇒ Object
191 192 193 194 195 196 197 198 199 200 201 202 203 204 |
# File 'lib/ferret/document/field.rb', line 191 def index=(index) if (index == Index::NO) @indexed = false @tokenized = false elsif (index == Index::TOKENIZED) @indexed = true @tokenized = true elsif (index == Index::UNTOKENIZED) @indexed = true @tokenized = false else raise "unknown stored parameter " + index.to_s end end |
#indexed? ⇒ Boolean
True iff the value of the field is to be indexed, so that it may be searched on.
41 |
# File 'lib/ferret/document/field.rb', line 41 def indexed?() return @indexed end |
#reader_value ⇒ Object
Returns the string value of the data that is stored in this field
250 251 252 253 254 255 256 257 258 259 |
# File 'lib/ferret/document/field.rb', line 250 def reader_value if @data.respond_to? :read return @data elsif @data.instance_of? String return Ferret::Utils::StringHelper::StringReader.new(@data) else # if it is binary object try to return a string representation return Ferret::Utils::StringHelper::StringReader.new(@data.to_s) end end |
#store_offsets? ⇒ Boolean
True if the offsets of this field are stored. The offsets are the positions of the start and end characters of the token in the whole field string
72 |
# File 'lib/ferret/document/field.rb', line 72 def store_offsets?() return @store_offset end |
#store_positions? ⇒ Boolean
True if the positions of the indexed terms in this field are stored.
67 |
# File 'lib/ferret/document/field.rb', line 67 def store_positions?() return @store_position end |
#store_term_vector=(store_term_vector) ⇒ Object
206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 |
# File 'lib/ferret/document/field.rb', line 206 def store_term_vector=(store_term_vector) if (store_term_vector == TermVector::NO) @store_term_vector = false @store_position = false @store_offset = false elsif (store_term_vector == TermVector::YES) @store_term_vector = true @store_position = false @store_offset = false elsif (store_term_vector == TermVector::WITH_POSITIONS) @store_term_vector = true @store_position = true @store_offset = false elsif (store_term_vector == TermVector::WITH_OFFSETS) @store_term_vector = true @store_position = false @store_offset = true elsif (store_term_vector == TermVector::WITH_POSITIONS_OFFSETS) @store_term_vector = true @store_position = true @store_offset = true else raise "unknown term_vector parameter " + store_term_vector.to_s end end |
#store_term_vector? ⇒ Boolean
True iff the term or terms used to index this field are stored as a term vector, available from IndexReader#term_freq_vector(). These methods do not provide access to the original content of the field, only to terms used to index it. If the original content must be preserved, use the stored attribute instead.
See IndexReader#term_freq_vector()
64 |
# File 'lib/ferret/document/field.rb', line 64 def store_term_vector?() return @store_term_vector end |
#stored=(stored) ⇒ Object
176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
# File 'lib/ferret/document/field.rb', line 176 def stored=(stored) if (stored == Store::YES) @stored = true @compressed = false elsif (stored == Store::COMPRESS) @stored = true @compressed = true elsif (stored == Store::NO) @stored = false @compressed = false else raise "unknown stored parameter " + stored.to_s end end |
#stored? ⇒ Boolean
True iff the value of the field is to be stored in the index for return with search hits. It is an error for this to be true if a field is Reader-valued.
37 |
# File 'lib/ferret/document/field.rb', line 37 def stored?() return @stored end |
#string_value ⇒ Object
Returns the string value of the data that is stored in this field
233 234 235 236 237 238 239 240 241 242 |
# File 'lib/ferret/document/field.rb', line 233 def string_value if @data.instance_of? String return @data elsif @data.respond_to? :read return @data.read() else # if it is binary object try to return a string representation return @data.to_s end end |
#to_s ⇒ Object
Prints a Field for human consumption.
276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 |
# File 'lib/ferret/document/field.rb', line 276 def to_s() str = "" if (@stored) str << "stored" @str << @compressed ? "/compressed," : "/uncompressed," end if (@indexed) then str << "indexed," end if (@tokenized) then str << "tokenized," end if (@store_term_vector) then str << "store_term_vector," end if (@store_offset) str << "term_vector_offsets," end if (@store_position) str << "term_vector_position," end if (@binary) then str << "binary," end str << '<' str << @name str << ':' if (@data != null) str << @data.to_s end str << '>' end |
#tokenized? ⇒ Boolean
True iff the value of the field should be tokenized as text prior to indexing. Un-tokenized fields are indexed as a single word and may not be Reader-valued.
46 |
# File 'lib/ferret/document/field.rb', line 46 def tokenized?() return @tokenized end |