Class: Ferret::Index::TermInfosWriter

Inherits:
Object
  • Object
show all
Defined in:
lib/ferret/index/term_infos_io.rb

Overview

This stores a monotonically increasing set of <Term, TermInfo> pairs in a Directory. A TermInfos can be written once, in order.

Constant Summary collapse

FORMAT =

The file format version, a negative number.

-2

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(dir, segment, fis, interval, is_index = false) ⇒ TermInfosWriter

Expert: The fraction of terms in the “dictionary” which should be stored in RAM. Smaller values use more memory, but make searching slightly faster, while larger values use less memory and make searching slightly slower. Searching is typically not dominated by dictionary lookup, so tweaking this is rarely useful.

Expert: The fraction of TermDocEnum entries stored in skip tables, used to accellerate TermDocEnum#skipTo(int). Larger values result in smaller indexes, greater acceleration, but fewer accelerable cases, while smaller values result in bigger indexes, less acceleration and more accelerable cases. More detailed experiments would be useful here.



33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# File 'lib/ferret/index/term_infos_io.rb', line 33

def initialize(dir, segment, fis, interval, is_index = false)
  @index_interval = interval
  @skip_interval = 16
  @last_index_pointer = 0
  @last_term = Term.new("", "")
  @last_term_info = TermInfo.new()
  @size = 0
  @is_index = is_index
  @field_infos = fis
  @out = dir.create_output(segment + (@is_index ? ".tii" : ".tis"))
  @out.write_int(FORMAT)                      # write format

  @out.write_long(0)                          # leave space for size

  @out.write_int(@index_interval)             # write @index_interval

  @out.write_int(@skip_interval)              # write @skip_interval

  unless is_index
    @other = TermInfosWriter.new(dir, segment, fis, interval, true)
    @other.other = self
  end
end

Instance Attribute Details

#index_intervalObject (readonly)

Returns the value of attribute index_interval.



7
8
9
# File 'lib/ferret/index/term_infos_io.rb', line 7

def index_interval
  @index_interval
end

#other=(value) ⇒ Object (writeonly)

Sets the attribute other



8
9
10
# File 'lib/ferret/index/term_infos_io.rb', line 8

def other=(value)
  @other = value
end

#outObject (readonly)

Returns the value of attribute out.



7
8
9
# File 'lib/ferret/index/term_infos_io.rb', line 7

def out
  @out
end

#skip_intervalObject (readonly)

Returns the value of attribute skip_interval.



7
8
9
# File 'lib/ferret/index/term_infos_io.rb', line 7

def skip_interval
  @skip_interval
end

Instance Method Details

#add(term, term_info) ⇒ Object

Adds a new <Term, TermInfo> pair to the set. Term must be lexicographically greater than all previous Terms added. TermInfo pointers must be positive and greater than all previous.



56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# File 'lib/ferret/index/term_infos_io.rb', line 56

def add(term, term_info)
  if (not @is_index and @last_term > term)
    raise IOError, "term out of order #{term.text} < #{@last_term.text}"
  end
  if (term_info.freq_pointer < @last_term_info.freq_pointer)
    raise IOError, "freq pointer out of order"
  end
  if (term_info.prox_pointer < @last_term_info.prox_pointer)
    raise IOError, "prox pointer out of order"
  end

  if (not @is_index and @size % @index_interval == 0)
    @other.add(@last_term, @last_term_info) # add an index term

  end

  write_term(term)                                 # write term

  @out.write_vint(term_info.doc_freq)              # write doc freq

  @out.write_vlong(term_info.freq_pointer - @last_term_info.freq_pointer)
  @out.write_vlong(term_info.prox_pointer - @last_term_info.prox_pointer)
  @out.write_vint(term_info.skip_offset) if (term_info.doc_freq >= @skip_interval) 

  if (@is_index) 
    @out.write_vlong(@other.out.pos() - @last_index_pointer)
    @last_index_pointer = @other.out.pos() # write pointer

  end

  @last_term_info.set!(term_info)
  @size += 1
end

#closeObject

Called to complete TermInfos creation.



87
88
89
90
91
92
93
# File 'lib/ferret/index/term_infos_io.rb', line 87

def close()
  @out.seek(4)          # write @size after format

  @out.write_long(@size)
  @out.close()

  @other.close() unless @is_index
end