Class: Bzip2::FFI::Reader

Inherits:
IO
  • Object
show all
Defined in:
lib/bzip2/ffi/reader.rb

Overview

Reader reads and decompresses a bzip2 compressed stream or file. The public instance methods of Reader are intended to be equivalent to those of a standard IO object.

Data can be read as a stream using Reader.open and #read, for example:

Bzip2::FFI::Reader.open(io_or_path) do |reader|
  while buffer = reader.read(1024) do
    # process uncompressed bytes in buffer
  end
end

Alternatively, without passing a block to Reader.open:

reader = Bzip2::FFI::Reader.open(io_or_path)
begin
  while buffer = reader.read(1024) do
    # process uncompressed bytes in buffer
  end
ensure
  reader.close
end

All the available bzipped data can be read in a single step using Reader.read:

uncompressed = Bzip2::FFI::Reader.read(io_or_path)

The Reader.open and Reader.read methods accept either an IO-like object or a file path. IO-like objects must have a #read method. Paths can be given as either a String or Pathname.

No character conversion is performed on decompressed bytes. The Reader.read and #read methods return instances of String that represent the raw decompressed bytes, with #encoding set to Encoding::ASCII_8BIT (also known as Encoding::BINARY).

Reader will normally read all consecutive bzip2 compressed structure from the given stream or file (unless the :first_only parameter is specified - see Reader.open). If the stream or file contains additional data beyond the end of the compressed bzip2 data, it may be read during decompression. If such an overread has occurred and the IO-like object being read from has a #seek method, Reader will use it to reposition the stream to the byte immediately following the end of the compressed bzip2 data. If #seek raises an IOError, it will be caught and the stream position will be left unchanged.

Reader does not support seeking (it's not supported by the underlying libbz2 library). There are no #seek or #pos= methods. The only way to advance the position is to call #read. Discard the result if it's not needed.

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from IO

#autoclose=, #autoclose?, #binmode, #binmode?, #closed?, #external_encoding, #internal_encoding

Constructor Details

#initialize(io, options = {}) ⇒ Reader

Initializes a Bzip2::FFI::Reader to read compressed data from an IO-like object (io). io must have a #read method.

The following options can be specified using the options Hash:

  • :autoclose - Set to true to close io when the Bzip2::FFI::Reader instance is closed.
  • :first_only - Bzip2 files can contain multiple consecutive compressed strctures. Normally all the structures will be decompressed with the decompressed bytes concatenated. Set to true to only read the first structure.
  • :small - Set to true to use an alternative decompression algorithm that uses less memory, but at the cost of decompressing more slowly (roughly 2,300 kB less memory at about half the speed).

#binmode is called on io if io responds to #binmode.

After use, the Bzip2::FFI::Reader instance should be closed using the #close method.

Parameters:

  • io (Object)

    An IO-like object with a #read method.

  • options (Hash) (defaults to: {})

    Optional parameters (:autoclose, :first_only and :small).

Raises:

  • (ArgumentError)

    If io is nil or does not respond to #read.

  • (Error::Bzip2Error)

    If an error occurs when initializing libbz2.



221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
# File 'lib/bzip2/ffi/reader.rb', line 221

def initialize(io, options = {})
  super
  raise ArgumentError, 'io must respond to read' unless io.respond_to?(:read)

  @first_only = options[:first_only]
  @small = options[:small] ? 1 : 0

  @in_eof = false
  @out_eof = false
  @in_buffer = nil
  @structure_number = 1
  @structure_start_pos = 0
  @in_pos = 0
  @out_pos = 0

  decompress_init(stream)
end

Class Method Details

.open(io_or_path, options = {}) {|reader| ... } ⇒ Object

Opens a Bzip2::FFI::Reader to read and decompress data from either an IO-like object or a file. IO-like objects must have a #read method. Files can be specified using either a String containing the file path or a Pathname.

If no block is given, the opened Bzip2::FFI::Reader instance is returned. After use, the instance should be closed using the #close method.

If a block is given, it will be passed the opened Bzip2::FFI::Reader instance as an argument. After the block terminates, the Bzip2::FFI::Reader instance will automatically be closed. open will then return the result of the block.

The following options can be specified using the options Hash:

  • :autoclose - When passing an IO-like object, set to true to close it when the Bzip2::FFI::Reader instance is closed.
  • :first_only - Bzip2 files can contain multiple consecutive compressed strctures. Normally all the structures will be decompressed with the decompressed bytes concatenated. Set to true to only read the first structure.
  • :small - Set to true to use an alternative decompression algorithm that uses less memory, but at the cost of decompressing more slowly (roughly 2,300 kB less memory at about half the speed).

If an IO-like object that has a #binmode method is passed to open, #binmode will be called on io_or_path before yielding to the block or returning.

Parameters:

  • io_or_path (Object)

    Either an IO-like object with a #read method or a file path as a String or Pathname.

  • options (Hash) (defaults to: {})

    Optional parameters (:autoclose and :small).

Yields:

  • (reader)

    If a block is given, it is yielded to.

Yield Parameters:

Returns:

  • (Object)

    The opened Bzip2::FFI::Reader instance if no block is given, or the result of the block if a block is given.

Raises:

  • (ArgumentError)

    If io_or_path is not a String, Pathname or an IO-like object with a #read method.

  • (Errno::ENOENT)

    If the specified file does not exist.

  • (Error::Bzip2Error)

    If an error occurs when initializing libbz2.



122
123
124
125
126
127
128
129
130
131
132
# File 'lib/bzip2/ffi/reader.rb', line 122

def open(io_or_path, options = {})
  if io_or_path.kind_of?(String) || io_or_path.kind_of?(Pathname)
    options = options.merge(autoclose: true)
    proc = -> { open_bzip_file(io_or_path.to_s, 'rb') }
    super(proc, options)
  elsif !io_or_path.kind_of?(Proc)
    super
  else
    raise ArgumentError, 'io_or_path must be an IO-like object or a path'
  end
end

.read(io_or_path, options = {}) ⇒ String

Reads and decompresses and entire bzip2 compressed structure from either an IO-like object or a file and returns the decompressed bytes as a String. IO-like objects must have a #read method. Files can be specified using either a String containing the file path or a Pathname.

The following options can be specified using the options Hash:

  • :autoclose - When passing an IO-like object, set to true to close it when the compressed data has been read.
  • :first_only - Bzip2 files can contain multiple consecutive compressed strctures. Normally all the structures will be decompressed with the decompressed bytes concatenated. Set to true to only read the first structure.
  • :small - Set to true to use an alternative decompression algorithm that uses less memory, but at the cost of decompressing more slowly (roughly 2,300 kB less memory at about half the speed).

No character conversion is performed on decompressed bytes. read returns a String that represents the raw decompressed bytes, with encoding set to Encoding::ASCII_8BIT (also known as Encoding::BINARY).

If an IO-like object that has a #inmode method is passed to read, #binmode will be called on io_or_path before any compressed data is read.

Parameters:

  • io_or_path (Object)

    Either an IO-like object with a #read method or a file path as a String or Pathname.

  • options (Hash) (defaults to: {})

    Optional parameters (:autoclose, :first_only and :small).

Returns:

  • (String)

    The decompressed data.

Raises:

  • (ArgumentError)

    If io_or_path is not a String, Pathname or an IO-like object with a #read method.

  • (Errno::ENOENT)

    If the specified file does not exist.

  • (Error::Bzip2Error)

    If an error occurs when initializing libbz2 or decompressing data.



174
175
176
177
178
# File 'lib/bzip2/ffi/reader.rb', line 174

def read(io_or_path, options = {})
  open(io_or_path, options) do |reader|
    reader.read
  end
end

Instance Method Details

#closeNilType

Ends decompression and closes the Bzip2::FFI::Reader.

If the open method is used with a block, it is not necessary to call #close. Otherwise, #close should be called once the Bzip2::FFI::Reader is no longer needed.

Returns:

  • (NilType)

    nil.

Raises:



247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
# File 'lib/bzip2/ffi/reader.rb', line 247

def close
  s = stream

  unless @out_eof
    decompress_end(s)
  end

  s[:next_in] = nil
  s[:next_out] = nil

  if @in_buffer
    @in_buffer.free
    @in_buffer = nil
  end

  super
end

#eof?Boolean Also known as: eof

Returns true if decompression has completed, otherwise false.

Note that it is possible for false to be returned after all the decompressed data has been read. In such cases, the next call to #read will detect the end of the bzip2 structure and set #eof? to true.

Returns:

  • (Boolean)

    If decompression has completed, otherwise false.

Raises:



350
351
352
353
# File 'lib/bzip2/ffi/reader.rb', line 350

def eof?
  check_closed
  @out_eof
end

#read(length = nil, buffer = nil) ⇒ String

Reads and decompresses data from the bzip2 compressed stream or file, returning the uncompressed bytes.

length must be a non-negative integer or nil.

If length is a positive integer, it specifies the maximum number of uncompressed bytes to return. read will return nil or a String with a length of 1 to length bytes containing the decompressed data. A result of nil or a String with a length less than length bytes indicates that the end of the decompressed data has been reached.

If length is nil, #read reads until the end of the decompressed data, returning the uncompressed bytes as a String.

If length is 0, #read returns an empty String.

If the optional buffer argument is present, it must reference a String that will receive the decompressed data. buffer will contain only the decompressed data after the call to #read, even if it is not empty beforehand.

No character conversion is performed on decompressed bytes. #read returns a String that represents the raw decompressed bytes, with encoding set to Encoding::ASCII_8BIT (also known as Encoding::BINARY).

Parameters:

  • length (Integer) (defaults to: nil)

    Must be a non-negative integer or nil. Set to a positive integer to specify the maximum number of uncompressed bytes to return. Set to nil to return the remaining decompressed data. Set to 0 to return an empty String.

  • buffer (String) (defaults to: nil)

    An optional buffer to receive the decompressed data.

Returns:

  • (String)

    The decompressed data as a String with ASCII-8BIT encoding, or nil if length was a positive integer and the end of the decompressed data has been reached.

Raises:



304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
# File 'lib/bzip2/ffi/reader.rb', line 304

def read(length = nil, buffer = nil)
  if buffer
    buffer.clear
    buffer.force_encoding(Encoding::ASCII_8BIT)
  end

  if length
    raise ArgumentError 'length must be a non-negative integer or nil' if length < 0

    if length == 0
      check_closed
      return buffer || String.new
    end

    decompressed = decompress(length)

    return nil unless decompressed
    buffer ? buffer << decompressed : decompressed
  else
    result = buffer ? StringIO.new(buffer) : StringIO.new

    # StringIO#binmode is a no-op, but call in case it is implemented in
    # future versions.
    result.binmode

    result.set_encoding(Encoding::ASCII_8BIT)

    loop do
      decompressed = decompress(DEFAULT_DECOMPRESS_COUNT)
      break unless decompressed
      result.write(decompressed)
      break if decompressed.bytesize < DEFAULT_DECOMPRESS_COUNT
    end

    result.string
  end
end

#tellInteger Also known as: pos

Returns the number of decompressed bytes that have been read.

Returns:

  • (Integer)

    The number of decompressed bytes that have been read.

Raises:



360
361
362
363
# File 'lib/bzip2/ffi/reader.rb', line 360

def tell
  check_closed
  @out_pos
end