Class: Ferret::Search::FieldCache
- Inherits:
-
Object
- Object
- Ferret::Search::FieldCache
- Includes:
- Index
- Defined in:
- lib/ferret/search/field_cache.rb
Overview
Expert: The default cache implementation, storing all values in memory. A WeakKeyHash is used for storage.
Defined Under Namespace
Classes: Entry, StringIndex
Constant Summary collapse
- INT_PARSER =
lambda {|i| i.to_i}
- FLOAT_PARSER =
lambda {|i| i.to_f}
- @@cache =
The internal cache. Maps Entry to array of interpreted term values.
Ferret::Utils::WeakKeyHash.new
Class Method Summary collapse
-
.get_auto_index(reader, field) ⇒ Object
Checks the internal cache for an appropriate entry, and if none is found reads
fieldto see if it contains integers, floats or strings, and then calls one of the other methods in this class to get the values. -
.get_index(reader, field, sort_type) ⇒ Object
Checks the internal cache for an appropriate entry, and if none is found, reads the terms in
fieldand parses them with the provided parser and returns an array of sizereader.max_docof the value each document has in the given field. -
.get_string_index(reader, field) ⇒ Object
Checks the internal cache for an appropriate entry, and if none is found reads the term values in
fieldand returns an array of them in natural order, along with an array telling which element in the term array each document uses. -
.lookup(reader, field, sort_type) ⇒ Object
See if an object is in the cache.
-
.store(reader, field, sort_type, value) ⇒ Object
Put an object into the cache.
Class Method Details
.get_auto_index(reader, field) ⇒ Object
Checks the internal cache for an appropriate entry, and if none is found reads field to see if it contains integers, floats or strings, and then calls one of the other methods in this class to get the values. For string values, a StringIndex is returned. After calling this method, there is an entry in the cache for both type AUTO and the actual found type.
- reader
-
Used to get field values.
- field
-
Which field contains the values.
- return
-
Integer Array, Float Array or StringIndex.
182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 |
# File 'lib/ferret/search/field_cache.rb', line 182 def FieldCache.get_auto_index(reader, field) index = lookup(reader, field, SortField::SortType::AUTO) if (index == nil) term_enum = reader.terms_from(Term.new(field, "")) begin term = term_enum.term if (term == nil) raise "no terms in field #{field} to sort by" end if (term.field == field) termtext = term.text.strip if (termtext == termtext.to_i.to_s) index = get_index(reader, field, SortField::SortType::INTEGER) elsif (termtext == termtext.to_f.to_s or termtext == "%f"%termtext.to_f) index = get_index(reader, field, SortField::SortType::FLOAT) else index = get_string_index(reader, field) end if (index != nil) store(reader, field, SortField::SortType::AUTO, index) end else raise "field \"#{field}\" does not appear to be indexed" end ensure term_enum.close() end end return index end |
.get_index(reader, field, sort_type) ⇒ Object
Checks the internal cache for an appropriate entry, and if none is found, reads the terms in field and parses them with the provided parser and returns an array of size reader.max_doc of the value each document has in the given field.
- reader
-
Used to get field values.
- field
-
Which field contains the values.
- sort_type
-
The type of sort to run on the field. Holds the parser
- return
-
The values in the given field for each document.
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
# File 'lib/ferret/search/field_cache.rb', line 72 def FieldCache.get_index(reader, field, sort_type) index = lookup(reader, field, sort_type) if (index == nil) parser = sort_type.parser index = Array.new(reader.max_doc) if (index.length > 0) term_docs = reader.term_docs term_enum = reader.terms_from(Term.new(field, "")) begin if term_enum.term.nil? raise "no terms in field '#{field}' to sort by" end begin term = term_enum.term break if (term.field != field) termval = parser.call(term.text) term_docs.seek(term) while term_docs.next? index[term_docs.doc] = termval end end while term_enum.next? ensure term_docs.close() term_enum.close() end end store(reader, field, sort_type, index) end return index end |
.get_string_index(reader, field) ⇒ Object
Checks the internal cache for an appropriate entry, and if none is found reads the term values in field and returns an array of them in natural order, along with an array telling which element in the term array each document uses.
- reader
-
Used to get field values.
- field
-
Which field contains the strings.
- returns
-
Array of terms and index into the array for each document.
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
# File 'lib/ferret/search/field_cache.rb', line 111 def FieldCache.get_string_index(reader, field) index = lookup(reader, field, SortField::SortType::STRING) if (index == nil) str_index = Array.new(reader.max_doc) str_map = Array.new(reader.max_doc+1) if (str_index.length > 0) term_docs = reader.term_docs term_enum = reader.terms_from(Term.new(field,"")) t = 0 # current term number # an entry for documents that have no terms in this field should a # document with no terms be at top or bottom? # # this puts them at the top - if it is changed, FieldDocSortedHitQueue # needs to change as well. str_map[t] = nil t += 1 begin if (term_enum.term() == nil) raise "no terms in field #{field} to sort by" end begin term = term_enum.term break if (term.field != field) # store term text # we expect that there is at most one term per document if (t >= str_map.length) raise "there are more terms than documents in field \"#{field}\", but it's impossible to sort on tokenized fields" end str_map[t] = term.text term_docs.seek(term) while term_docs.next? str_index[term_docs.doc] = t end t += 1 end while term_enum.next? ensure term_docs.close() term_enum.close() end if (t == 0) # if there are no terms, make the term array # have a single nil entry # str_map = [nil] <= already set above elsif (t < str_map.length) # if there are less terms than documents, # trim off the dead array space str_map.compact! end end index = StringIndex.new(str_index, str_map) store(reader, field, SortField::SortType::STRING, index) end return index end |
.lookup(reader, field, sort_type) ⇒ Object
See if an object is in the cache.
41 42 43 44 45 46 47 48 |
# File 'lib/ferret/search/field_cache.rb', line 41 def FieldCache.lookup(reader, field, sort_type) entry = Entry.new(field, sort_type) @@cache.synchronize() do reader_cache = @@cache[reader] return nil if reader_cache.nil? return reader_cache[entry] end end |
.store(reader, field, sort_type, value) ⇒ Object
Put an object into the cache.
51 52 53 54 55 56 57 58 59 60 61 |
# File 'lib/ferret/search/field_cache.rb', line 51 def FieldCache.store(reader, field, sort_type, value) entry = Entry.new(field, sort_type) @@cache.synchronize() do reader_cache = @@cache[reader] if (reader_cache == nil) reader_cache = {} @@cache[reader] = reader_cache end return reader_cache[entry] = value end end |