Class: Ferret::Search::IndexSearcher

Inherits:
Object
  • Object
show all
Includes:
Index
Defined in:
lib/ferret/search/index_searcher.rb

Overview

Implements search over a single IndexReader.

Applications usually need only call the inherited @link #search(Query)end or @link #search(Query,Filter)endmethods. For performance reasons it is recommended to open only one IndexSearcher and use it for all of your searches.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(arg) ⇒ IndexSearcher

Creates a searcher searching the index in the provided directory.

You need to pass one argument which should be one of the following:

* An index reader which the searcher will search
* A directory where the searcher will open an index reader to search
* A string which represents a path to the directory to be searched


21
22
23
24
25
26
27
28
29
30
31
32
33
34
# File 'lib/ferret/search/index_searcher.rb', line 21

def initialize(arg)
  if arg.is_a?(IndexReader)
    @reader = arg
  elsif arg.is_a?(Ferret::Store::Directory)
    @reader = IndexReader.open(arg, false)
  elsif arg.is_a?(String)
    @dir = Ferret::Store::FSDirectory.new(arg, false)
    @reader = IndexReader.open(@dir, true)
  else
    raise ArgumentError, "Unknown argument passed to initialize IndexReader"
  end

  @similarity = Similarity.default
end

Instance Attribute Details

#readerObject

Returns the value of attribute reader.



11
12
13
# File 'lib/ferret/search/index_searcher.rb', line 11

def reader
  @reader
end

#similarityObject

Returns the value of attribute similarity.



11
12
13
# File 'lib/ferret/search/index_searcher.rb', line 11

def similarity
  @similarity
end

Instance Method Details

#closeObject

IndexSearcher was constructed with IndexSearcher®. If the IndexReader was supplied implicitly by specifying a directory, then the IndexReader gets closed.



39
40
41
# File 'lib/ferret/search/index_searcher.rb', line 39

def close()
  @reader.close()
end

#create_weight(query) ⇒ Object

Creates a weight for query

returns

new weight



75
76
77
# File 'lib/ferret/search/index_searcher.rb', line 75

def create_weight(query)
  return query.weight(self)
end

#doc(i) ⇒ Object

Expert: Returns the stored fields of document i.

See IndexReader#get_document



62
63
64
# File 'lib/ferret/search/index_searcher.rb', line 62

def doc(i)
  return @reader.get_document(i)
end

#doc_freq(term) ⇒ Object

Expert: Returns the number of documents containing term. Called by search code to compute term weights. See IndexReader#doc_freq



46
47
48
# File 'lib/ferret/search/index_searcher.rb', line 46

def doc_freq(term)
  return @reader.doc_freq(term)
end

#doc_freqs(terms) ⇒ Object

Expert: For each term in the terms array, calculates the number of documents containing term. Returns an array with these document frequencies. Used to minimize number of remote calls.



53
54
55
56
57
# File 'lib/ferret/search/index_searcher.rb', line 53

def doc_freqs(terms)
  result = Array.new(terms.length)
  terms.each_with_index {|term, i| result[i] = doc_freq(term)}
  return result
end

#explain(query, doc) ⇒ Object

Returns an Explanation that describes how doc scored against query.

This is intended to be used in developing Similarity implementations, and, for good performance, should not be displayed with every hit. Computing an explanation is as expensive as executing the query over the entire index.



176
177
178
# File 'lib/ferret/search/index_searcher.rb', line 176

def explain(query, doc)
  return query.weight(self).explain(@reader, doc)
end

#max_docObject

Expert: Returns one greater than the largest possible document number. Called by search code to compute term weights. See IndexReader#max_doc



69
70
71
# File 'lib/ferret/search/index_searcher.rb', line 69

def max_doc()
  return @reader.max_doc()
end

#rewrite(original) ⇒ Object

rewrites the query into a query that can be processed by the search methods. For example, a Fuzzy query is turned into a massive boolean query.

original

The original query to be rewritten.



159
160
161
162
163
164
165
166
167
# File 'lib/ferret/search/index_searcher.rb', line 159

def rewrite(original)
  query = original
  rewritten_query = query.rewrite(@reader)
  while query != rewritten_query
    query = rewritten_query
    rewritten_query = query.rewrite(@reader)
  end
  return query
end

#search(query, options = {}) ⇒ Object

The main search method for the index. You need to create a query to pass to this method. You can also pass a hash with one or more of the following; num_docs, first_doc, sort

query

The query to run on the index

filter

filters docs from the search result

first_doc

The index in the results of the first doc retrieved. Default is 0

num_docs

The number of results returned. Default is 10

sort

An array of SortFields describing how to sort the results.



89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# File 'lib/ferret/search/index_searcher.rb', line 89

def search(query, options = {})
  filter = options[:filter]
  first_doc = options[:first_doc]||0
  num_docs = options[:num_docs]||10
  sort = options[:sort]

  if (num_docs <= 0)  # nil might be returned from hq.top() below.
    raise ArgumentError, "num_docs must be > 0 to run a search"
  end

  scorer = query.weight(self).scorer(@reader)
  if (scorer == nil)
    return TopDocs.new(0, [])
  end

  bits = (filter.nil? ? nil : filter.bits(@reader))
  if (sort)
    fields = sort.is_a?(Array) ? sort : sort.fields
    hq = FieldSortedHitQueue.new(@reader, fields, num_docs + first_doc)
  else
    hq = HitQueue.new(num_docs + first_doc)
  end
  total_hits = 0
  min_score = 0.0
  scorer.each_hit() do |doc, score|
    if score > 0.0 and (bits.nil? or bits.get(doc)) # skip docs not in bits
      total_hits += 1
      if hq.size < num_docs or score >= min_score 
        hq.insert(ScoreDoc.new(doc, score))
        min_score = hq.top.score # maintain min_score
      end
    end
  end

  score_docs = Array.new(hq.size)
  if (hq.size > first_doc)
    score_docs = Array.new(hq.size - first_doc)
    first_doc.times { hq.pop }
    (hq.size - 1).downto(0) do |i|
      score_docs[i] = hq.pop
    end
  else
    score_docs = []
    hq.clear
  end

  return TopDocs.new(total_hits, score_docs)
end

#search_each(query, filter = nil) ⇒ Object

Accepts a block and iterates through all of results yielding the doc number and the score for that hit. The hits are unsorted. This is the fastest way to get all of the hits from a search. However, you will usually want your hits sorted at least by score so you should use the #search method.



143
144
145
146
147
148
149
150
151
152
# File 'lib/ferret/search/index_searcher.rb', line 143

def search_each(query, filter = nil)
  scorer = query.weight(self).scorer(@reader)
  return if scorer == nil
  bits = (filter.nil? ? nil : filter.bits(@reader))
  scorer.each_hit() do |doc, score|
    if score > 0.0 and (bits.nil? or bits.get(doc)) # skip docs not in bits
      yield(doc, score)
    end
  end
end