Class: Ferret::Index::TermInfosWriter
- Inherits:
-
Object
- Object
- Ferret::Index::TermInfosWriter
- Defined in:
- lib/ferret/index/term_infos_io.rb
Overview
This stores a monotonically increasing set of <Term, TermInfo> pairs in a Directory. A TermInfos can be written once, in order.
Constant Summary collapse
- FORMAT =
The file format version, a negative number.
-2
Instance Attribute Summary collapse
-
#index_interval ⇒ Object
readonly
Returns the value of attribute index_interval.
-
#other ⇒ Object
writeonly
Sets the attribute other.
-
#out ⇒ Object
readonly
Returns the value of attribute out.
-
#skip_interval ⇒ Object
readonly
Returns the value of attribute skip_interval.
Instance Method Summary collapse
-
#add(term, term_info) ⇒ Object
Adds a new <Term, TermInfo> pair to the set.
-
#close ⇒ Object
Called to complete TermInfos creation.
-
#initialize(dir, segment, fis, interval, is_index = false) ⇒ TermInfosWriter
constructor
Expert: The fraction of terms in the “dictionary” which should be stored in RAM.
Constructor Details
#initialize(dir, segment, fis, interval, is_index = false) ⇒ TermInfosWriter
Expert: The fraction of terms in the “dictionary” which should be stored in RAM. Smaller values use more memory, but make searching slightly faster, while larger values use less memory and make searching slightly slower. Searching is typically not dominated by dictionary lookup, so tweaking this is rarely useful.
Expert: The fraction of TermDocEnum entries stored in skip tables, used to accellerate TermDocEnum#skipTo(int). Larger values result in smaller indexes, greater acceleration, but fewer accelerable cases, while smaller values result in bigger indexes, less acceleration and more accelerable cases. More detailed experiments would be useful here.
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
# File 'lib/ferret/index/term_infos_io.rb', line 33 def initialize(dir, segment, fis, interval, is_index = false) @index_interval = interval @skip_interval = 16 @last_index_pointer = 0 @last_term = Term.new("", "") @last_term_info = TermInfo.new() @size = 0 @is_index = is_index @field_infos = fis @out = dir.create_output(segment + (@is_index ? ".tii" : ".tis")) @out.write_int(FORMAT) # write format @out.write_long(0) # leave space for size @out.write_int(@index_interval) # write @index_interval @out.write_int(@skip_interval) # write @skip_interval unless is_index @other = TermInfosWriter.new(dir, segment, fis, interval, true) @other.other = self end end |
Instance Attribute Details
#index_interval ⇒ Object (readonly)
Returns the value of attribute index_interval.
7 8 9 |
# File 'lib/ferret/index/term_infos_io.rb', line 7 def index_interval @index_interval end |
#other=(value) ⇒ Object (writeonly)
Sets the attribute other
8 9 10 |
# File 'lib/ferret/index/term_infos_io.rb', line 8 def other=(value) @other = value end |
#out ⇒ Object (readonly)
Returns the value of attribute out.
7 8 9 |
# File 'lib/ferret/index/term_infos_io.rb', line 7 def out @out end |
#skip_interval ⇒ Object (readonly)
Returns the value of attribute skip_interval.
7 8 9 |
# File 'lib/ferret/index/term_infos_io.rb', line 7 def skip_interval @skip_interval end |
Instance Method Details
#add(term, term_info) ⇒ Object
Adds a new <Term, TermInfo> pair to the set. Term must be lexicographically greater than all previous Terms added. TermInfo pointers must be positive and greater than all previous.
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
# File 'lib/ferret/index/term_infos_io.rb', line 56 def add(term, term_info) if (not @is_index and @last_term > term) raise IOError, "term out of order #{term.text} < #{@last_term.text}" end if (term_info.freq_pointer < @last_term_info.freq_pointer) raise IOError, "freq pointer out of order" end if (term_info.prox_pointer < @last_term_info.prox_pointer) raise IOError, "prox pointer out of order" end if (not @is_index and @size % @index_interval == 0) @other.add(@last_term, @last_term_info) # add an index term end write_term(term) # write term @out.write_vint(term_info.doc_freq) # write doc freq @out.write_vlong(term_info.freq_pointer - @last_term_info.freq_pointer) @out.write_vlong(term_info.prox_pointer - @last_term_info.prox_pointer) @out.write_vint(term_info.skip_offset) if (term_info.doc_freq >= @skip_interval) if (@is_index) @out.write_vlong(@other.out.pos() - @last_index_pointer) @last_index_pointer = @other.out.pos() # write pointer end @last_term_info.set!(term_info) @size += 1 end |
#close ⇒ Object
Called to complete TermInfos creation.
87 88 89 90 91 92 93 |
# File 'lib/ferret/index/term_infos_io.rb', line 87 def close() @out.seek(4) # write @size after format @out.write_long(@size) @out.close() @other.close() unless @is_index end |