Class: Ferret::Search::IndexSearcher
- Inherits:
-
Object
- Object
- Ferret::Search::IndexSearcher
- Includes:
- Index
- Defined in:
- lib/ferret/search/index_searcher.rb
Overview
Implements search over a single IndexReader.
Applications usually need only call the inherited @link #search(Query)end or @link #search(Query,Filter)endmethods. For performance reasons it is recommended to open only one IndexSearcher and use it for all of your searches.
Instance Attribute Summary collapse
-
#reader ⇒ Object
Returns the value of attribute reader.
-
#similarity ⇒ Object
Returns the value of attribute similarity.
Instance Method Summary collapse
-
#close ⇒ Object
IndexSearcher was constructed with IndexSearcher®.
-
#create_weight(query) ⇒ Object
- Creates a weight for
query
returns -
new weight.
- Creates a weight for
-
#doc(i) ⇒ Object
Expert: Returns the stored fields of document
i
. -
#doc_freq(term) ⇒ Object
Expert: Returns the number of documents containing
term
. -
#doc_freqs(terms) ⇒ Object
Expert: For each term in the terms array, calculates the number of documents containing
term
. -
#explain(query, doc) ⇒ Object
Returns an Explanation that describes how
doc
scored againstquery
. -
#initialize(arg) ⇒ IndexSearcher
constructor
Creates a searcher searching the index in the provided directory.
-
#max_doc ⇒ Object
Expert: Returns one greater than the largest possible document number.
-
#rewrite(original) ⇒ Object
rewrites the query into a query that can be processed by the search methods.
-
#search(query, options = {}) ⇒ Object
The main search method for the index.
-
#search_each(query, filter = nil) ⇒ Object
Accepts a block and iterates through all of results yielding the doc number and the score for that hit.
Constructor Details
#initialize(arg) ⇒ IndexSearcher
Creates a searcher searching the index in the provided directory.
You need to pass one argument which should be one of the following:
* An index reader which the searcher will search
* A directory where the searcher will open an index reader to search
* A string which represents a path to the directory to be searched
21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
# File 'lib/ferret/search/index_searcher.rb', line 21 def initialize(arg) if arg.is_a?(IndexReader) @reader = arg elsif arg.is_a?(Ferret::Store::Directory) @reader = IndexReader.open(arg, false) elsif arg.is_a?(String) @dir = Ferret::Store::FSDirectory.new(arg, false) @reader = IndexReader.open(@dir, true) else raise ArgumentError, "Unknown argument passed to initialize IndexReader" end @similarity = Similarity.default end |
Instance Attribute Details
#reader ⇒ Object
Returns the value of attribute reader.
11 12 13 |
# File 'lib/ferret/search/index_searcher.rb', line 11 def reader @reader end |
#similarity ⇒ Object
Returns the value of attribute similarity.
11 12 13 |
# File 'lib/ferret/search/index_searcher.rb', line 11 def similarity @similarity end |
Instance Method Details
#close ⇒ Object
IndexSearcher was constructed with IndexSearcher®. If the IndexReader was supplied implicitly by specifying a directory, then the IndexReader gets closed.
39 40 41 |
# File 'lib/ferret/search/index_searcher.rb', line 39 def close() @reader.close() end |
#create_weight(query) ⇒ Object
Creates a weight for query
- returns
-
new weight
75 76 77 |
# File 'lib/ferret/search/index_searcher.rb', line 75 def create_weight(query) return query.weight(self) end |
#doc(i) ⇒ Object
Expert: Returns the stored fields of document i
.
See IndexReader#get_document
62 63 64 |
# File 'lib/ferret/search/index_searcher.rb', line 62 def doc(i) return @reader.get_document(i) end |
#doc_freq(term) ⇒ Object
Expert: Returns the number of documents containing term
. Called by search code to compute term weights. See IndexReader#doc_freq
46 47 48 |
# File 'lib/ferret/search/index_searcher.rb', line 46 def doc_freq(term) return @reader.doc_freq(term) end |
#doc_freqs(terms) ⇒ Object
Expert: For each term in the terms array, calculates the number of documents containing term
. Returns an array with these document frequencies. Used to minimize number of remote calls.
53 54 55 56 57 |
# File 'lib/ferret/search/index_searcher.rb', line 53 def doc_freqs(terms) result = Array.new(terms.length) terms.each_with_index {|term, i| result[i] = doc_freq(term)} return result end |
#explain(query, doc) ⇒ Object
Returns an Explanation that describes how doc
scored against query
.
This is intended to be used in developing Similarity implementations, and, for good performance, should not be displayed with every hit. Computing an explanation is as expensive as executing the query over the entire index.
176 177 178 |
# File 'lib/ferret/search/index_searcher.rb', line 176 def explain(query, doc) return query.weight(self).explain(@reader, doc) end |
#max_doc ⇒ Object
Expert: Returns one greater than the largest possible document number. Called by search code to compute term weights. See IndexReader#max_doc
69 70 71 |
# File 'lib/ferret/search/index_searcher.rb', line 69 def max_doc() return @reader.max_doc() end |
#rewrite(original) ⇒ Object
rewrites the query into a query that can be processed by the search methods. For example, a Fuzzy query is turned into a massive boolean query.
- original
-
The original query to be rewritten.
159 160 161 162 163 164 165 166 167 |
# File 'lib/ferret/search/index_searcher.rb', line 159 def rewrite(original) query = original rewritten_query = query.rewrite(@reader) while query != rewritten_query query = rewritten_query rewritten_query = query.rewrite(@reader) end return query end |
#search(query, options = {}) ⇒ Object
The main search method for the index. You need to create a query to pass to this method. You can also pass a hash with one or more of the following; num_docs, first_doc, sort
- query
-
The query to run on the index
- filter
-
filters docs from the search result
- first_doc
-
The index in the results of the first doc retrieved. Default is 0
- num_docs
-
The number of results returned. Default is 10
- sort
-
An array of SortFields describing how to sort the results.
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
# File 'lib/ferret/search/index_searcher.rb', line 89 def search(query, = {}) filter = [:filter] first_doc = [:first_doc]||0 num_docs = [:num_docs]||10 sort = [:sort] if (num_docs <= 0) # nil might be returned from hq.top() below. raise ArgumentError, "num_docs must be > 0 to run a search" end scorer = query.weight(self).scorer(@reader) if (scorer == nil) return TopDocs.new(0, []) end bits = (filter.nil? ? nil : filter.bits(@reader)) if (sort) fields = sort.is_a?(Array) ? sort : sort.fields hq = FieldSortedHitQueue.new(@reader, fields, num_docs + first_doc) else hq = HitQueue.new(num_docs + first_doc) end total_hits = 0 min_score = 0.0 scorer.each_hit() do |doc, score| if score > 0.0 and (bits.nil? or bits.get(doc)) # skip docs not in bits total_hits += 1 if hq.size < num_docs or score >= min_score hq.insert(ScoreDoc.new(doc, score)) min_score = hq.top.score # maintain min_score end end end score_docs = Array.new(hq.size) if (hq.size > first_doc) score_docs = Array.new(hq.size - first_doc) first_doc.times { hq.pop } (hq.size - 1).downto(0) do |i| score_docs[i] = hq.pop end else score_docs = [] hq.clear end return TopDocs.new(total_hits, score_docs) end |
#search_each(query, filter = nil) ⇒ Object
Accepts a block and iterates through all of results yielding the doc number and the score for that hit. The hits are unsorted. This is the fastest way to get all of the hits from a search. However, you will usually want your hits sorted at least by score so you should use the #search method.
143 144 145 146 147 148 149 150 151 152 |
# File 'lib/ferret/search/index_searcher.rb', line 143 def search_each(query, filter = nil) scorer = query.weight(self).scorer(@reader) return if scorer == nil bits = (filter.nil? ? nil : filter.bits(@reader)) scorer.each_hit() do |doc, score| if score > 0.0 and (bits.nil? or bits.get(doc)) # skip docs not in bits yield(doc, score) end end end |