Class: IOStreams::Line::Reader

Inherits:
Object
  • Object
show all
Defined in:
lib/io_streams/line/reader.rb

Constant Summary collapse

MAX_BLOCKS_MULTIPLIER =

Prevent denial of service when a delimiter is not found before this number * ‘buffer_size` characters are read.

100
LINEFEED_REGEXP =
Regexp.compile(/\r\n|\n|\r/).freeze

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(input_stream, delimiter: nil, buffer_size: 65_536) ⇒ Reader

Create a delimited stream reader from the supplied input stream.

Lines returned will be in the encoding of the input stream. To change the encoding of retruned lines, use IOStreams::Encode::Reader.

Parameters

input_stream
  The input stream that implements #read

delimiter: [String]
  Line / Record delimiter to use to break the stream up into records
    Any string to break the stream up by.
    This delimiter is removed from each line when `#each` or `#readline` is called.
  Default: nil
    Automatically detect line endings and break up by line
    Searches for the first "\r\n" or "\n" and then uses that as the
    delimiter for all subsequent records.

buffer_size: [Integer]
  Size of blocks to read from the input stream at a time.
  Default: 65536 ( 64K )

TODO:

  • Handle embedded line feeds when reading csv files.

  • Skip Comment lines. RegExp?

  • Skip “empty” / “blank” lines. RegExp?

  • Extract header line(s) / first non-comment, non-blank line

  • Embedded newline support, RegExp? or Proc?



48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# File 'lib/io_streams/line/reader.rb', line 48

def initialize(input_stream, delimiter: nil, buffer_size: 65_536)
  @input_stream = input_stream
  @buffer_size  = buffer_size

  # More efficient read buffering only supported when the input stream `#read` method supports it.
  @use_read_cache_buffer = !@input_stream.method(:read).arity.between?(0, 1)

  @line_count        = 0
  @eof               = false
  @read_cache_buffer = nil
  @buffer            = nil

  read_block
  # Auto-detect windows/linux line endings if not supplied. \n or \r\n
  @delimiter = delimiter || auto_detect_line_endings

  if @buffer
    # Change the delimiters encoding to match that of the input stream
    @delimiter      = @delimiter.encode(@buffer.encoding)
    @delimiter_size = @delimiter.size
  end
end

Instance Attribute Details

#buffer_sizeObject (readonly)

Returns the value of attribute buffer_size.



4
5
6
# File 'lib/io_streams/line/reader.rb', line 4

def buffer_size
  @buffer_size
end

#delimiterObject (readonly)

Returns the value of attribute delimiter.



4
5
6
# File 'lib/io_streams/line/reader.rb', line 4

def delimiter
  @delimiter
end

#line_countObject (readonly)

Returns the value of attribute line_count.



4
5
6
# File 'lib/io_streams/line/reader.rb', line 4

def line_count
  @line_count
end

Class Method Details

.open(file_name_or_io, **args) ⇒ Object

Read a line at a time from a file or stream



12
13
14
15
16
17
18
# File 'lib/io_streams/line/reader.rb', line 12

def self.open(file_name_or_io, **args)
  if file_name_or_io.is_a?(String)
    IOStreams::File::Reader.open(file_name_or_io) { |io| yield new(io, **args) }
  else
    yield new(file_name_or_io, **args)
  end
end

Instance Method Details

#eachObject

Iterate over every line in the file/stream passing each line to supplied block in turn. Returns [Integer] the number of lines read from the file/stream. Note:

  • The line delimiter is not returned.



75
76
77
78
79
80
81
# File 'lib/io_streams/line/reader.rb', line 75

def each
  until eof?
    line = readline
    yield(line) unless line.nil?
  end
  line_count
end

#eof?Boolean

Returns whether the end of file has been reached for this stream

Returns:

  • (Boolean)


109
110
111
# File 'lib/io_streams/line/reader.rb', line 109

def eof?
  @eof && (@buffer.nil? || @buffer.empty?)
end

#readlineObject



83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# File 'lib/io_streams/line/reader.rb', line 83

def readline
  return if eof?

  # Keep reading until it finds the delimiter
  while (index = @buffer.index(@delimiter)).nil? && read_block
  end

  # Delimiter found?
  if index
    data    = @buffer.slice(0, index)
    @buffer = @buffer.slice(index + @delimiter_size, @buffer.size)
    @line_count += 1
  elsif @eof && @buffer.empty?
    data    = nil
    @buffer = nil
  else
    # Last line without delimiter
    data    = @buffer
    @buffer = nil
    @line_count += 1
  end

  data
end