Class: Ferret::Index::Index

Inherits:
Object
  • Object
show all
Includes:
Search, Store, MonitorMixin
Defined in:
lib/ferret/index.rb

Overview

This is a simplified interface to the index. See the TUTORIAL for more information on how to use this class.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options = {}, &block) ⇒ Index

If you create an Index without any options, it’ll simply create an index in memory. But this class is highly configurable and every option that you can supply to IndexWriter and QueryParser, you can also set here. Please look at the options for the constructors to these classes.

Options

See;

  • QueryParser

  • IndexWriter

default_input_field

Default: “id”. This specifies the default field that will be used when you add a simple string to the index using #add_document or <<.

id_field

Default: “id”. This field is as the field to search when doing searches on a term. For example, if you do a lookup by term “cat”, ie index, this will be the field that is searched.

key

Default: nil. Expert: This should only be used if you really know what you are doing. Basically you can set a field or an array of fields to be the key for the index. So if you add a document with a same key as an existing document, the existing document will be replaced by the new object. Using a multiple field key will slow down indexing so it should not be done if performance is a concern. A single field key (or id) should be find however. Also, you must make sure that your key/keys are either untokenized or that they are not broken up by the analyzer.

auto_flush

Default: false. Set this option to true if you want the index automatically flushed every time you do a write (includes delete) to the index. This is useful if you have multiple processes accessing the index and you don’t want lock errors. Setting :auto_flush to true has a huge performance impact so don’t use it if you are concerned about performance. In that case you should think about setting up a DRb indexing service.

lock_retry_time

Default: 2 seconds. This parameter specifies how long to wait before retrying to obtain the commit lock when detecting if the IndexReader is at the latest version.

close_dir

Default: false. If you explicitly pass a Directory object to this class and you want Index to close it when it is closed itself then set this to true.

use_typed_range_query

Default: true. Use TypedRangeQuery instead of the standard RangeQuery when parsing range queries. This is useful if you have number fields which you want to perform range queries on. You won’t need to pad or normalize the data in the field in anyway to get correct results. However, performance will be a lot slower for large indexes, hence the default.

Examples

index = Index::Index.new(:analyzer => WhiteSpaceAnalyzer.new())

index = Index::Index.new(:path => '/path/to/index',
                         :create_if_missing => false,
                         :auto_flush => true)

index = Index::Index.new(:dir => directory,
                         :default_slop => 2,
                         :handle_parse_errors => false)

You can also pass a block if you like. The index will be yielded and closed at the index of the box. For example;

Ferret::I.new() do |index|
  # do stuff with index. Most of your actions will be cached.
end


91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
# File 'lib/ferret/index.rb', line 91

def initialize(options = {}, &block)
  super()

  if options[:key]
    @key = options[:key]
    if @key.is_a?(Array)
      @key.flatten.map {|k| k.to_s.intern}
    end
  else
    @key = nil
  end

  if (fi = options[:field_infos]).is_a?(String)
    options[:field_infos] = FieldInfos.load(fi)
  end

  @close_dir = options[:close_dir]
  if options[:dir].is_a?(String)
    options[:path] = options[:dir]
  end
  if options[:path]
    @close_dir = true
    begin
      @dir = FSDirectory.new(options[:path], options[:create])
    rescue IOError => io
      @dir = FSDirectory.new(options[:path],
                             options[:create_if_missing] != false)
    end
  elsif options[:dir]
    @dir = options[:dir]
  else
    options[:create] = true # this should always be true for a new RAMDir
    @close_dir = true
    @dir = RAMDirectory.new
  end

  @dir.extend(MonitorMixin) unless @dir.kind_of? MonitorMixin
  options[:dir] = @dir
  options[:lock_retry_time]||= 2
  @options = options
  if (!@dir.exists?("segments")) || options[:create]
    IndexWriter.new(options).close
  end
  options[:analyzer]||= Ferret::Analysis::StandardAnalyzer.new
  if options[:use_typed_range_query].nil?
    options[:use_typed_range_query] = true
  end

  @searcher = nil
  @writer = nil
  @reader = nil

  @options.delete(:create) # only create the first time if at all
  @auto_flush = @options[:auto_flush] || false
  if (@options[:id_field].nil? and @key.is_a?(Symbol))
    @id_field = @key
  else
    @id_field = @options[:id_field] || :id
  end
  @default_field = (@options[:default_field]||= :*)
  @default_input_field = options[:default_input_field] || @id_field

  if @default_input_field.respond_to?(:intern)
    @default_input_field = @default_input_field.intern
  end
  @open = true
  @qp = nil
  if block
    yield self
    self.close
  end
end

Instance Attribute Details

#optionsObject (readonly)

Returns the value of attribute options.



12
13
14
# File 'lib/ferret/index.rb', line 12

def options
  @options
end

Instance Method Details

#add_document(doc, analyzer = nil) ⇒ Object Also known as: <<

Adds a document to this index, using the provided analyzer instead of the local analyzer if provided. If the document contains more than IndexWriter::MAX_FIELD_LENGTH terms for a given field, the remainder are discarded.

There are three ways to add a document to the index. To add a document you can simply add a string or an array of strings. This will store all the strings in the “” (ie empty string) field (unless you specify the default_field when you create the index).

index << "This is a new document to be indexed"
index << ["And here", "is another", "new document", "to be indexed"]

But these are pretty simple documents. If this is all you want to index you could probably just use SimpleSearch. So let’s give our documents some fields;

index << {:title => "Programming Ruby", :content => "blah blah blah"}
index << {:title => "Programming Ruby", :content => "yada yada yada"}

Or if you are indexing data stored in a database, you’ll probably want to store the id;

index << {:id => row.id, :title => row.title, :date => row.date}

See FieldInfos for more information on how to set field properties.



263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
# File 'lib/ferret/index.rb', line 263

def add_document(doc, analyzer = nil)
  @dir.synchronize do
    ensure_writer_open()
    if doc.is_a?(String) or doc.is_a?(Array)
      doc = {@default_input_field => doc}
    end

    # delete existing documents with the same key
    if @key
      if @key.is_a?(Array)
        query = @key.inject(BooleanQuery.new()) do |bq, field|
          bq.add_query(TermQuery.new(field, doc[field].to_s), :must)
          bq
        end
        query_delete(query)
      else
        id = doc[@key].to_s
        if id
          @writer.delete(@key, id)
        end
      end
    end
    ensure_writer_open()

    if analyzer
      old_analyzer = @writer.analyzer
      @writer.analyzer = analyzer
      @writer.add_document(doc)
      @writer.analyzer = old_analyzer
    else
      @writer.add_document(doc)
    end

    flush() if @auto_flush
  end
end

#add_indexes(indexes) ⇒ Object

Merges all segments from an index or an array of indexes into this index. You can pass a single Index::Index, Index::Reader, Store::Directory or an array of any single one of these.

This may be used to parallelize batch indexing. A large document collection can be broken into sub-collections. Each sub-collection can be indexed in parallel, on a different thread, process or machine and perhaps all in memory. The complete index can then be created by merging sub-collection indexes with this method.

After this completes, the index is optimized.



757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
# File 'lib/ferret/index.rb', line 757

def add_indexes(indexes)
  @dir.synchronize do
    ensure_writer_open()
    indexes = [indexes].flatten   # make sure we have an array
    return if indexes.size == 0 # nothing to do
    if indexes[0].is_a?(Index)
      indexes.delete(self) # don't merge with self
      indexes = indexes.map {|index| index.reader }
    elsif indexes[0].is_a?(Ferret::Store::Directory)
      indexes.delete(@dir) # don't merge with self
      indexes = indexes.map {|dir| IndexReader.new(dir) }
    elsif indexes[0].is_a?(IndexReader)
      indexes.delete(@reader) # don't merge with self
    else
      raise ArgumentError, "Unknown index type when trying to merge indexes"
    end
    ensure_writer_open
    @writer.add_readers(indexes)
  end
end

#batch_update(docs) ⇒ Object

Batch updates the documents in an index. You can pass either a Hash or an Array.

Array (recommended)

If you pass an Array then each value needs to be a Document or a Hash and each of those documents must have an :id_field which will be used to delete the old document that this document is replacing.

Hash

If you pass a Hash then the keys of the Hash will be considered the id‘s and the values will be the new documents to replace the old ones with.If the id is an Integer then it is considered a Ferret document number and the corresponding document will be deleted. If the id is a String or a Symbol then the id will be considered a term and the documents that contain that term in the :id_field will be deleted.

Note: No error will be raised if the document does not currently exist. A new document will simply be created.

Examples

# will replace the documents with the +id+'s id:133 and id:254
@index.batch_update({
    '133' => {:id => '133', :content => 'yada yada yada'},
    '253' => {:id => '253', :content => 'bla bla bal'}
  })

# will replace the documents with the Ferret Document numbers 2 and 92
@index.batch_update({
    2  => {:id => '133', :content => 'yada yada yada'},
    92 => {:id => '253', :content => 'bla bla bal'}
  })

# will replace the documents with the +id+'s id:133 and id:254
# this is recommended as it guarantees no duplicate keys
@index.batch_update([
    {:id => '133', :content => 'yada yada yada'},
    {:id => '253', :content => 'bla bla bal'}
  ])
docs

A Hash of id/document pairs. The set of documents to be updated



626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
# File 'lib/ferret/index.rb', line 626

def batch_update(docs)
  @dir.synchronize do
    ids = values = nil
    case docs
    when Array
      ids = docs.collect{|doc| doc[@id_field].to_s}
      if ids.include?(nil)
        raise ArgumentError, "all documents must have an #{@id_field} " 
                             "field when doing a batch update"
      end
    when Hash
      ids = docs.keys
      docs = docs.values
    else
      raise ArgumentError, "must pass Hash or Array, not #{docs.class}"
    end
    batch_delete(ids)
    ensure_writer_open()
    docs.each {|new_doc| @writer << new_doc }
    flush()
  end
end

#closeObject

Closes this index by closing its associated reader and writer objects.



202
203
204
205
206
207
208
209
210
211
212
213
214
# File 'lib/ferret/index.rb', line 202

def close
  @dir.synchronize do
    if not @open
      raise(StandardError, "tried to close an already closed directory")
    end
    @searcher.close() if @searcher
    @reader.close() if @reader
    @writer.close() if @writer
    @dir.close() if @close_dir

    @open = false
  end
end

#delete(arg) ⇒ Object

Deletes a document/documents from the index. The method for determining the document to delete depends on the type of the argument passed.

If arg is an Integer then delete the document based on the internal document number. Will raise an error if the document does not exist.

If arg is a String then search for the documents with arg in the id field. The id field is either :id or whatever you set :id_field parameter to when you create the Index object. Will fail quietly if the no document exists.

If arg is a Hash or an Array then a batch delete will be performed. If arg is an Array then it will be considered an array of id‘s. If it is a Hash, then its keys will be used instead as the Array of document id’s. If the id is an Integer then it is considered a Ferret document number and the corresponding document will be deleted. If the id is a String or a Symbol then the id will be considered a term and the documents that contain that term in the :id_field will be deleted.



517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
# File 'lib/ferret/index.rb', line 517

def delete(arg)
  @dir.synchronize do
    if arg.is_a?(String) or arg.is_a?(Symbol)
      ensure_writer_open()
      @writer.delete(@id_field, arg.to_s)
    elsif arg.is_a?(Integer)
      ensure_reader_open()
      cnt = @reader.delete(arg)
    elsif arg.is_a?(Hash) or arg.is_a?(Array)
      batch_delete(arg)
    else
      raise ArgumentError, "Cannot delete for arg of type #{arg.class}"
    end
    flush() if @auto_flush
  end
  return self
end

#deleted?(n) ⇒ Boolean

Returns true if document n has been deleted

Returns:

  • (Boolean)


553
554
555
556
557
558
# File 'lib/ferret/index.rb', line 553

def deleted?(n)
  @dir.synchronize do 
    ensure_reader_open()
    return @reader.deleted?(n) 
  end
end

#doc(*arg) ⇒ Object Also known as: []

Retrieves a document/documents from the index. The method for retrieval depends on the type of the argument passed.

If arg is an Integer then return the document based on the internal document number.

If arg is a Range, then return the documents within the range based on internal document number.

If arg is a String then search for the first document with arg in the id field. The id field is either :id or whatever you set :id_field parameter to when you create the Index object.



451
452
453
454
455
456
457
458
459
460
461
462
463
# File 'lib/ferret/index.rb', line 451

def doc(*arg)
  @dir.synchronize do
    id = arg[0]
    if id.kind_of?(String) or id.kind_of?(Symbol)
      ensure_reader_open()
      term_doc_enum = @reader.term_docs_for(@id_field, id.to_s)
      return term_doc_enum.next? ? @reader[term_doc_enum.doc] : nil
    else
      ensure_reader_open(false)
      return @reader[*arg]
    end
  end
end

#eachObject

iterate through all documents in the index. This method preloads the documents so you don’t need to call #load on the document to load all the fields.



489
490
491
492
493
494
495
496
# File 'lib/ferret/index.rb', line 489

def each
  @dir.synchronize do
    ensure_reader_open
    (0...@reader.max_doc).each do |i|
      yield @reader[i].load unless @reader.deleted?(i)
    end
  end
end

#explain(query, doc) ⇒ Object

Returns an Explanation that describes how doc scored against query.

This is intended to be used in developing Similarity implementations, and, for good performance, should not be displayed with every hit. Computing an explanation is as expensive as executing the query over the entire index.



823
824
825
826
827
828
829
830
# File 'lib/ferret/index.rb', line 823

def explain(query, doc)
  @dir.synchronize do
    ensure_searcher_open()
    query = do_process_query(query)

    return @searcher.explain(query, doc)
  end
end

#field_infosObject

Returns the field_infos object so that you can add new fields to the index.



842
843
844
845
846
847
# File 'lib/ferret/index.rb', line 842

def field_infos
  @dir.synchronize do
    ensure_writer_open()
    return @writer.field_infos
  end
end

#flushObject Also known as: commit

Flushes all writes to the index. This will not optimize the index but it will make sure that all writes are written to it.

NOTE: this is not necessary if you are only using this class. All writes will automatically flush when you perform an operation that reads the index.



711
712
713
714
715
716
717
718
719
720
721
722
723
724
# File 'lib/ferret/index.rb', line 711

def flush()
  @dir.synchronize do
    if @reader
      if @searcher
        @searcher.close
        @searcher = nil
      end
      @reader.commit
    elsif @writer
      @writer.close
      @writer = nil
    end
  end
end

#has_deletions?Boolean

Returns true if any documents have been deleted since the index was last flushed.

Returns:

  • (Boolean)


698
699
700
701
702
703
# File 'lib/ferret/index.rb', line 698

def has_deletions?()
  @dir.synchronize do
    ensure_reader_open()
    return @reader.has_deletions?
  end
end

#highlight(query, doc_id, options = {}) ⇒ Object

Returns an array of strings with the matches highlighted. The query can either a query String or a Ferret::Search::Query object. The doc_id is the id of the document you want to highlight (usually returned by the search methods). There are also a number of options you can pass;

Options

field

Default: @options. The default_field is the field that is usually highlighted but you can specify which field you want to highlight here. If you want to highlight multiple fields then you will need to call this method multiple times.

excerpt_length

Default: 150. Length of excerpt to show. Highlighted terms will be in the centre of the excerpt. Set to :all to highlight the entire field.

num_excerpts

Default: 2. Number of excerpts to return.

pre_tag

Default: “<b>”. Tag to place to the left of the match. You’ll probably want to change this to a “<span>” tag with a class. Try “033[36m” for use in a terminal.

post_tag

Default: “</b>”. This tag should close the :pre_tag. Try tag “033[m” in the terminal.

ellipsis

Default: “…”. This is the string that is appended at the beginning and end of excerpts (unless the excerpt hits the start or end of the field. Alternatively you may want to use the HTML entity &#8230; or the UTF-8 string “342200246”.



191
192
193
194
195
196
197
198
199
# File 'lib/ferret/index.rb', line 191

def highlight(query, doc_id, options = {})
  @dir.synchronize do
    ensure_searcher_open()
    @searcher.highlight(do_process_query(query),
                        doc_id,
                        options[:field]||@options[:default_field],
                        options)
  end
end

#optimizeObject

optimizes the index. This should only be called when the index will no longer be updated very often, but will be read a lot.



729
730
731
732
733
734
735
736
# File 'lib/ferret/index.rb', line 729

def optimize()
  @dir.synchronize do
    ensure_writer_open()
    @writer.optimize()
    @writer.close()
    @writer = nil
  end
end

#persist(directory, create = true) ⇒ Object

This is a simple utility method for saving an in memory or RAM index to the file system. The same thing can be achieved by using the Index::Index#add_indexes method and you will have more options when creating the new index, however this is a simple way to turn a RAM index into a file system index.

directory

This can either be a Store::Directory object or a String representing the path to the directory where you would like to store the index.

create

True if you’d like to create the directory if it doesn’t exist or copy over an existing directory. False if you’d like to merge with the existing directory. This defaults to false.



792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
# File 'lib/ferret/index.rb', line 792

def persist(directory, create = true)
  synchronize do
    close_all()
    old_dir = @dir
    if directory.is_a?(String)
      @dir = FSDirectory.new(directory, create)
    elsif directory.is_a?(Ferret::Store::Directory)
      @dir = directory
    end
    @dir.extend(MonitorMixin) unless @dir.kind_of? MonitorMixin
    @options[:dir] = @dir
    @options[:create_if_missing] = true
    add_indexes([old_dir])
  end
end

#process_query(query) ⇒ Object

Turn a query string into a Query object with the Index’s QueryParser



833
834
835
836
837
838
# File 'lib/ferret/index.rb', line 833

def process_query(query)
  @dir.synchronize do
    ensure_searcher_open()
    return do_process_query(query)
  end
end

#query_delete(query) ⇒ Object

Delete all documents returned by the query.

query

The query to find documents you wish to delete. Can either be a string (in which case it is parsed by the standard query parser) or an actual query object.



540
541
542
543
544
545
546
547
548
549
550
# File 'lib/ferret/index.rb', line 540

def query_delete(query)
  @dir.synchronize do
    ensure_writer_open()
    ensure_searcher_open()
    query = do_process_query(query)
    @searcher.search_each(query, :limit => :all) do |doc, score|
      @reader.delete(doc)
    end
    flush() if @auto_flush
  end
end

#query_update(query, new_val) ⇒ Object

Update all the documents returned by the query.

query

The query to find documents you wish to update. Can either be a string (in which case it is parsed by the standard query parser) or an actual query object.

new_val

The values we are updating. This can be a string in which case the default field is updated, or it can be a hash, in which case, all fields in the hash are merged into the old hash. That is, the old fields are replaced by values in the new hash if they exist.

Example

index << {:id => "26", :title => "Babylon", :artist => "David Grey"}
index << {:id => "29", :title => "My Oh My", :artist => "David Grey"}

# correct 
index.query_update('artist:"David Grey"', {:artist => "David Gray"})

index["26"]
  #=> {:id => "26", :title => "Babylon", :artist => "David Gray"}
index["28"]
  #=> {:id => "28", :title => "My Oh My", :artist => "David Gray"}


674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
# File 'lib/ferret/index.rb', line 674

def query_update(query, new_val)
  @dir.synchronize do
    ensure_writer_open()
    ensure_searcher_open()
    docs_to_add = []
    query = do_process_query(query)
    @searcher.search_each(query, :limit => :all) do |id, score|
      document = @searcher[id].load
      if new_val.is_a?(Hash)
        document.merge!(new_val)
      else new_val.is_a?(String) or new_val.is_a?(Symbol)
        document[@default_input_field] = new_val.to_s
      end
      docs_to_add << document
      @reader.delete(id)
    end
    ensure_writer_open()
    docs_to_add.each {|doc| @writer << doc }
    flush() if @auto_flush
  end
end

#readerObject

Get the reader for this index.

NOTE

This will close the writer from this index.



218
219
220
221
# File 'lib/ferret/index.rb', line 218

def reader
  ensure_reader_open()
  return @reader
end

#scan(query, options = {}) ⇒ Object

Run a query through the Searcher on the index, ignoring scoring and starting at :start_doc and stopping when :limit matches have been found. It returns an array of the matching document numbers.

There is a big performance advange when using this search method on a very large index when there are potentially thousands of matching documents and you only want say 50 of them. The other search methods need to look at every single match to decide which one has the highest score. This search method just needs to find :limit number of matches before it returns.

Options

start_doc

Default: 0. The start document to start the search from. NOTE very carefully that this is not the same as the :offset parameter used in the other search methods which refers to the offset in the result-set. This is the document to start the scan from. So if you scanning through the index in increments of 50 documents at a time you need to use the last matched doc in the previous search to start your next search. See the example below.

limit

Default: 50. This is the number of results you want returned, also called the page size. Set :limit to :all to return all results.

TODO: add option to return loaded documents instead

Options

start_doc = 0
begin
  results = @searcher.scan(query, :start_doc => start_doc)
  yield results # or do something with them
  start_doc = results.last
  # start_doc will be nil now if results is empty, ie no more matches
end while start_doc


430
431
432
433
434
435
436
437
# File 'lib/ferret/index.rb', line 430

def scan(query, options = {})
  @dir.synchronize do
    ensure_searcher_open()
    query = do_process_query(query)

    @searcher.scan(query, options)
  end
end

#search(query, options = {}) ⇒ Object

Run a query through the Searcher on the index. A TopDocs object is returned with the relevant results. The query is a built in Query object or a query string that can be parsed by the Ferret::QueryParser. Here are the options;

Options

offset

Default: 0. The offset of the start of the section of the result-set to return. This is used for paging through results. Let’s say you have a page size of 10. If you don’t find the result you want among the first 10 results then set :offset to 10 and look at the next 10 results, then 20 and so on.

limit

Default: 10. This is the number of results you want returned, also called the page size. Set :limit to :all to return all results

sort

A Sort object or sort string describing how the field should be sorted. A sort string is made up of field names which cannot contain spaces and the word “DESC” if you want the field reversed, all separated by commas. For example; “rating DESC, author, title”. Note that Ferret will try to determine a field’s type by looking at the first term in the index and seeing if it can be parsed as an integer or a float. Keep this in mind as you may need to specify a fields type to sort it correctly. For more on this, see the documentation for SortField

filter

a Filter object to filter the search results with

filter_proc

a filter Proc is a Proc which takes the doc_id, the score and the Searcher object as its parameters and returns a Boolean value specifying whether the result should be included in the result set.



332
333
334
335
336
# File 'lib/ferret/index.rb', line 332

def search(query, options = {})
  @dir.synchronize do
    return do_search(query, options)
  end
end

#search_each(query, options = {}) ⇒ Object

Run a query through the Searcher on the index. A TopDocs object is returned with the relevant results. The query is a Query object or a query string that can be validly parsed by the Ferret::QueryParser. The Searcher#search_each method yields the internal document id (used to reference documents in the Searcher object like this; searcher[doc_id]) and the search score for that document. It is possible for the score to be greater than 1.0 for some queries and taking boosts into account. This method will also normalize scores to the range 0.0..1.0 when the max-score is greater than 1.0. Here are the options;

Options

offset

Default: 0. The offset of the start of the section of the result-set to return. This is used for paging through results. Let’s say you have a page size of 10. If you don’t find the result you want among the first 10 results then set :offset to 10 and look at the next 10 results, then 20 and so on.

limit

Default: 10. This is the number of results you want returned, also called the page size. Set :limit to :all to return all results

sort

A Sort object or sort string describing how the field should be sorted. A sort string is made up of field names which cannot contain spaces and the word “DESC” if you want the field reversed, all separated by commas. For example; “rating DESC, author, title”. Note that Ferret will try to determine a field’s type by looking at the first term in the index and seeing if it can be parsed as an integer or a float. Keep this in mind as you may need to specify a fields type to sort it correctly. For more on this, see the documentation for SortField

filter

a Filter object to filter the search results with

filter_proc

a filter Proc is a Proc which takes the doc_id, the score and the Searcher object as its parameters and returns a Boolean value specifying whether the result should be included in the result set.

returns

The total number of hits.

Example

eg.

index.search_each(query, options = {}) do |doc, score|
  puts "hit document number #{doc} with a score of #{score}"
end


384
385
386
387
388
389
390
391
392
393
# File 'lib/ferret/index.rb', line 384

def search_each(query, options = {}) # :yield: doc, score
  @dir.synchronize do
    ensure_searcher_open()
    query = do_process_query(query)

    @searcher.search_each(query, options) do |doc, score|
      yield doc, score
    end
  end
end

#searcherObject

Get the searcher for this index.

NOTE

This will close the writer from this index.



225
226
227
228
# File 'lib/ferret/index.rb', line 225

def searcher
  ensure_searcher_open()
  return @searcher
end

#sizeObject

returns the number of documents in the index



739
740
741
742
743
744
# File 'lib/ferret/index.rb', line 739

def size()
  @dir.synchronize do
    ensure_reader_open()
    return @reader.num_docs()
  end
end

#term_vector(id, field) ⇒ Object

Retrieves the term_vector for a document. The document can be referenced by either a string id to match the id field or an integer corresponding to Ferret’s document number.

See Ferret::Index::IndexReader#term_vector



471
472
473
474
475
476
477
478
479
480
481
482
483
484
# File 'lib/ferret/index.rb', line 471

def term_vector(id, field)
  @dir.synchronize do
    ensure_reader_open()
    if id.kind_of?(String) or id.kind_of?(Symbol)
      term_doc_enum = @reader.term_docs_for(@id_field, id.to_s)
      if term_doc_enum.next?
        id = term_doc_enum.doc
      else
        return nil
      end
    end
    return @reader.term_vector(id, field)
  end
end

#to_sObject



808
809
810
811
812
813
814
# File 'lib/ferret/index.rb', line 808

def to_s
  buf = ""
  (0...(size)).each do |i|
    buf << self[i].to_s + "\n" if not deleted?(i)
  end
  buf
end

#update(id, new_doc) ⇒ Object

Update the document referenced by the document number id if id is an integer or all of the documents which have the term id if id is a term.. For batch update of set of documents, for performance reasons, see batch_update

id

The number of the document to update. Can also be a string representing the value in the id field. Also consider using the :key attribute.

new_doc

The document to replace the old document with



569
570
571
572
573
574
575
576
577
578
579
580
581
# File 'lib/ferret/index.rb', line 569

def update(id, new_doc)
  @dir.synchronize do
    ensure_writer_open()
    delete(id)
    if id.is_a?(String) or id.is_a?(Symbol)
      @writer.commit
    else
      ensure_writer_open()
    end
    @writer << new_doc
    flush() if @auto_flush
  end
end

#writerObject

Get the writer for this index.

NOTE

This will close the reader from this index.



232
233
234
235
# File 'lib/ferret/index.rb', line 232

def writer
  ensure_writer_open()
  return @writer
end