Class: Ferret::Store::IndexInput

Inherits:
Object
  • Object
show all
Defined in:
lib/ferret/store/index_io.rb,
ext/index_io.c

Overview

Ferret’s IO Input methods are defined here. The methods read_byte and read_bytes need to be defined before this class is of any use.

Direct Known Subclasses

BufferedIndexInput

Instance Method Summary collapse

Instance Method Details

#closeObject

Closes the stream to futher operations.

Raises:

  • (NotImplementedError)


123
124
125
# File 'lib/ferret/store/index_io.rb', line 123

def close
  raise NotImplementedError
end

#lengthObject

The number of bytes in the file.

Raises:

  • (NotImplementedError)


139
140
141
# File 'lib/ferret/store/index_io.rb', line 139

def length
  raise NotImplementedError
end

#posObject

Returns the current position in this file, where the next read will occur.

Raises:

  • (NotImplementedError)


129
130
131
# File 'lib/ferret/store/index_io.rb', line 129

def pos
  raise NotImplementedError
end

#read_byteObject

Reads and returns a single byte.

Raises:

  • (NotImplementedError)


7
8
9
# File 'lib/ferret/store/index_io.rb', line 7

def read_byte()
  raise NotImplementedError
end

#read_bytes(buf, offset, len) ⇒ Object

Reads a specified number of bytes into an array at the specified offset.

buf

the array to read bytes into

offset

the offset in the array to start storing bytes

len

the number of bytes to read

Raises:

  • (NotImplementedError)


15
16
17
# File 'lib/ferret/store/index_io.rb', line 15

def read_bytes(buf, offset, len)
  raise NotImplementedError
end

#read_chars(buf, start, length) ⇒ Object

Reads UTF-8 encoded characters into an array.

buf

the array to read characters into

start

the offset in the array to start storing characters

length

the number of characters to read

TODO: Test on some actual UTF-8 documents.



92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
# File 'lib/ferret/store/index_io.rb', line 92

def read_chars(buf, start, length)
  if buf.length < (start + length)
    # make room for the characters to read
    buf << " " * (start + length - buf.length)
  end
  last = start + length
  (start...last).each do |i|
    buf[i] = read_byte.chr
  end
#        last = start + length
#        
#        (start...last).each do |i|
#          b = read_byte
#          if (b & 0x80) == 0
#            buf[i] = (b & 0x7F).chr # don't need to worry about UTF-8 here
#          else
#            if (b & 0xE0) != 0xE0
#              tmp_int = (((b & 0x1F) << 6) | (read_byte & 0x3F))
#              buf[i] = [tmp_int].pack("C") # pack into a UTF-8 string
#            else
#              buf[i] = [
#                         ((b & 0x0F) << 12) |
#                         ((read_byte & 0x3F) << 6) |
#                         (read_byte & 0x3F)
#                       ].pack("U") # pack into a UTF-8 string
#            end
#          end
#        end
end

#read_intObject

Reads four bytes and returns an int. read_uint should be used for unsigned integers for performance reasons.



22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# File 'lib/ferret/store/index_io.rb', line 22

def read_int
  # This may be slow. I'm not sure if this is the best way to get
  # integers from files but this is the only way I could find to get
  # signed integers.
  #i = read_byte
  #return (((i&0x80)==0 ? 0 : -1) << 32) |
         #(i << 24) |
         #((read_byte) << 16) |
         #((read_byte) << 8) |
         #(read_byte)
  i1 = read_byte
  i2 = read_byte
  i3 = read_byte
  i4 = read_byte
  res =  (((i1&0x80) == 0 ? 0 : -0x100000000)) +
         ((i1 << 24) + (i2 << 16) + (i3 << 8) + (i4))
  return res
end

#read_longObject

Reads eight bytes and returns a long.



42
43
44
# File 'lib/ferret/store/index_io.rb', line 42

def read_long
  return (read_int << 32) + (read_int & 0xFFFFFFFF)
end

#read_stringObject

Reads a string. A string is stored as a single vint which describes the length of the string, followed by the actually string itself.



77
78
79
80
81
82
83
84
# File 'lib/ferret/store/index_io.rb', line 77

def read_string
  length = read_vint
  
  chars = Array.new(length, ' ')
  read_chars(chars, 0, length)
  
  chars.to_s
end

#read_uintObject

Reads four bytes and returns a positive integer



47
48
49
50
# File 'lib/ferret/store/index_io.rb', line 47

def read_uint
  return ((read_byte) << 24) | ((read_byte) << 16) |
         ((read_byte) <<  8) |  (read_byte)
end

#read_ulongObject

Reads eight bytes and returns a positive integer.



53
54
55
# File 'lib/ferret/store/index_io.rb', line 53

def read_ulong
  return (read_uint << 32) | (read_uint & 0xFFFFFFFF)
end

#read_vintObject Also known as: read_vlong

Reads an int stored in variable-length format. Reads between one and five bytes. Smaller values take fewer bytes. Negative numbers are not supported.



60
61
62
63
64
65
66
67
68
69
70
71
72
# File 'lib/ferret/store/index_io.rb', line 60

def read_vint
  b = read_byte
  i = b & 0x7F # 0x7F = 0b01111111
  shift = 7
  
  while b & 0x80 != 0 # 0x80 = 0b10000000
    b = read_byte
    i |= (b & 0x7F) << shift
    shift += 7
  end
  
  return i
end

#seek(pos) ⇒ Object

Sets current position in this file, where the next read will occur.

Raises:

  • (NotImplementedError)


134
135
136
# File 'lib/ferret/store/index_io.rb', line 134

def seek(pos)
  raise NotImplementedError
end