Class: Picky::Indexers::Parallel
- Defined in:
- lib/picky/indexers/parallel.rb
Overview
Uses a number of categories, a source, and a tokenizer to index data.
The tokenizer is taken from each category if specified, from the index, if not.
Instance Attribute Summary
Attributes inherited from Base
Instance Method Summary collapse
- #flush(file, cache) ⇒ Object
- #index_flush(objects, file, category, cache, tokenizer) ⇒ Object
-
#process(source_for_prepare, categories, scheduler = Scheduler.new) ⇒ Object
Process does the actual indexing.
Methods inherited from Base
#check, #initialize, #notify_finished, #prepare, #reset
Constructor Details
This class inherits a constructor from Picky::Indexers::Base
Instance Method Details
#flush(file, cache) ⇒ Object
78 79 80 |
# File 'lib/picky/indexers/parallel.rb', line 78 def flush file, cache file.write(cache.join) && cache.clear end |
#index_flush(objects, file, category, cache, tokenizer) ⇒ Object
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
# File 'lib/picky/indexers/parallel.rb', line 58 def index_flush objects, file, category, cache, tokenizer comma = ?, newline = ?\n # Optimized, therefore duplicate code. # id = category.id from = category.from objects.each do |object| tokens = object.send from tokens, _ = tokenizer.tokenize tokens if tokenizer # Note: Originals not needed. TODO Optimize? tokens.each do |token_text| next unless token_text cache << object.send(id) << comma << token_text << newline end end flush file, cache end |
#process(source_for_prepare, categories, scheduler = Scheduler.new) ⇒ Object
Process does the actual indexing.
Parameters:
* categories: An Enumerable of Category-s.
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
# File 'lib/picky/indexers/parallel.rb', line 18 def process source_for_prepare, categories, scheduler = Scheduler.new # Prepare a combined object - array. # combined = categories.map do |category| [category, category.prepared_index_file, [], category.tokenizer] end # Go through each object in the source. # objects = [] reset source_for_prepare source_for_prepare.each do |object| # Accumulate objects. # objects << object next if objects.size < 10_000 # THINK Is it a good idea that not the tokenizer has # control over when he gets the next text? # combined.each do |category, file, cache, tokenizer| index_flush objects, file, category, cache, tokenizer end objects.clear end # Close all files. # combined.each do |category, file, cache, tokenizer| index_flush objects, file, category, cache, tokenizer yield file file.close end end |