Class: Ferret::Store::IndexInput
- Inherits:
-
Object
- Object
- Ferret::Store::IndexInput
- Defined in:
- lib/ferret/store/index_io.rb,
ext/index_io.c
Overview
Ferret’s IO Input methods are defined here. The methods read_byte and read_bytes need to be defined before this class is of any use.
Direct Known Subclasses
Instance Method Summary collapse
-
#close ⇒ Object
Closes the stream to futher operations.
-
#length ⇒ Object
The number of bytes in the file.
-
#pos ⇒ Object
Returns the current position in this file, where the next read will occur.
-
#read_byte ⇒ Object
Reads and returns a single byte.
-
#read_bytes(buf, offset, len) ⇒ Object
Reads a specified number of bytes into an array at the specified offset.
-
#read_chars(buf, start, length) ⇒ Object
Reads UTF-8 encoded characters into an array.
-
#read_int ⇒ Object
Reads four bytes and returns an int.
-
#read_long ⇒ Object
Reads eight bytes and returns a long.
-
#read_string ⇒ Object
Reads a string.
-
#read_uint ⇒ Object
Reads four bytes and returns a positive integer.
-
#read_ulong ⇒ Object
Reads eight bytes and returns a positive integer.
-
#read_vint ⇒ Object
(also: #read_vlong)
Reads an int stored in variable-length format.
-
#seek(pos) ⇒ Object
Sets current position in this file, where the next read will occur.
Instance Method Details
#close ⇒ Object
Closes the stream to futher operations.
123 124 125 |
# File 'lib/ferret/store/index_io.rb', line 123 def close raise NotImplementedError end |
#length ⇒ Object
The number of bytes in the file.
139 140 141 |
# File 'lib/ferret/store/index_io.rb', line 139 def length raise NotImplementedError end |
#pos ⇒ Object
Returns the current position in this file, where the next read will occur.
129 130 131 |
# File 'lib/ferret/store/index_io.rb', line 129 def pos raise NotImplementedError end |
#read_byte ⇒ Object
Reads and returns a single byte.
7 8 9 |
# File 'lib/ferret/store/index_io.rb', line 7 def read_byte() raise NotImplementedError end |
#read_bytes(buf, offset, len) ⇒ Object
Reads a specified number of bytes into an array at the specified offset.
- buf
-
the array to read bytes into
- offset
-
the offset in the array to start storing bytes
- len
-
the number of bytes to read
15 16 17 |
# File 'lib/ferret/store/index_io.rb', line 15 def read_bytes(buf, offset, len) raise NotImplementedError end |
#read_chars(buf, start, length) ⇒ Object
Reads UTF-8 encoded characters into an array.
- buf
-
the array to read characters into
- start
-
the offset in the array to start storing characters
- length
-
the number of characters to read
TODO: Test on some actual UTF-8 documents.
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
# File 'lib/ferret/store/index_io.rb', line 92 def read_chars(buf, start, length) if buf.length < (start + length) # make room for the characters to read buf << " " * (start + length - buf.length) end last = start + length (start...last).each do |i| buf[i] = read_byte.chr end # last = start + length # # (start...last).each do |i| # b = read_byte # if (b & 0x80) == 0 # buf[i] = (b & 0x7F).chr # don't need to worry about UTF-8 here # else # if (b & 0xE0) != 0xE0 # tmp_int = (((b & 0x1F) << 6) | (read_byte & 0x3F)) # buf[i] = [tmp_int].pack("C") # pack into a UTF-8 string # else # buf[i] = [ # ((b & 0x0F) << 12) | # ((read_byte & 0x3F) << 6) | # (read_byte & 0x3F) # ].pack("U") # pack into a UTF-8 string # end # end # end end |
#read_int ⇒ Object
Reads four bytes and returns an int. read_uint should be used for unsigned integers for performance reasons.
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
# File 'lib/ferret/store/index_io.rb', line 22 def read_int # This may be slow. I'm not sure if this is the best way to get # integers from files but this is the only way I could find to get # signed integers. #i = read_byte #return (((i&0x80)==0 ? 0 : -1) << 32) | #(i << 24) | #((read_byte) << 16) | #((read_byte) << 8) | #(read_byte) i1 = read_byte i2 = read_byte i3 = read_byte i4 = read_byte res = (((i1&0x80) == 0 ? 0 : -0x100000000)) + ((i1 << 24) + (i2 << 16) + (i3 << 8) + (i4)) return res end |
#read_long ⇒ Object
Reads eight bytes and returns a long.
42 43 44 |
# File 'lib/ferret/store/index_io.rb', line 42 def read_long return (read_int << 32) + (read_int & 0xFFFFFFFF) end |
#read_string ⇒ Object
Reads a string. A string is stored as a single vint which describes the length of the string, followed by the actually string itself.
77 78 79 80 81 82 83 84 |
# File 'lib/ferret/store/index_io.rb', line 77 def read_string length = read_vint chars = Array.new(length, ' ') read_chars(chars, 0, length) chars.to_s end |
#read_uint ⇒ Object
Reads four bytes and returns a positive integer
47 48 49 50 |
# File 'lib/ferret/store/index_io.rb', line 47 def read_uint return ((read_byte) << 24) | ((read_byte) << 16) | ((read_byte) << 8) | (read_byte) end |
#read_ulong ⇒ Object
Reads eight bytes and returns a positive integer.
53 54 55 |
# File 'lib/ferret/store/index_io.rb', line 53 def read_ulong return (read_uint << 32) | (read_uint & 0xFFFFFFFF) end |
#read_vint ⇒ Object Also known as: read_vlong
Reads an int stored in variable-length format. Reads between one and five bytes. Smaller values take fewer bytes. Negative numbers are not supported.
60 61 62 63 64 65 66 67 68 69 70 71 72 |
# File 'lib/ferret/store/index_io.rb', line 60 def read_vint b = read_byte i = b & 0x7F # 0x7F = 0b01111111 shift = 7 while b & 0x80 != 0 # 0x80 = 0b10000000 b = read_byte i |= (b & 0x7F) << shift shift += 7 end return i end |
#seek(pos) ⇒ Object
Sets current position in this file, where the next read will occur.
134 135 136 |
# File 'lib/ferret/store/index_io.rb', line 134 def seek(pos) raise NotImplementedError end |