Class: IOStreams::Line::Reader

Inherits:
Object
  • Object
show all
Defined in:
lib/io_streams/line/reader.rb

Constant Summary collapse

MAX_BLOCKS_MULTIPLIER =

Prevent denial of service when a delimiter is not found before this number * ‘buffer_size` characters are read.

100
LINEFEED_REGEXP =
Regexp.compile(/\r\n|\n|\r/).freeze

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(input_stream, delimiter: nil, buffer_size: 65_536, embedded_within: nil) ⇒ Reader

Create a delimited stream reader from the supplied input stream.

Lines returned will be in the encoding of the input stream. To change the encoding of returned lines, use IOStreams::Encode::Reader.

Parameters

input_stream
  The input stream that implements #read

delimiter: [String]
  Line / Record delimiter to use to break the stream up into records
    Any string to break the stream up by.
    This delimiter is removed from each line when `#each` or `#readline` is called.
  Default: nil
    Automatically detect line endings and break up by line
    Searches for the first "\r\n" or "\n" and then uses that as the
    delimiter for all subsequent records.

buffer_size: [Integer]
  Size of blocks to read from the input stream at a time.
  Default: 65536 ( 64K )

TODO:

  • Handle embedded line feeds when reading csv files.

  • Skip Comment lines. RegExp?

  • Skip “empty” / “blank” lines. RegExp?

  • Extract header line(s) / first non-comment, non-blank line

  • Embedded newline support, RegExp? or Proc?



48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# File 'lib/io_streams/line/reader.rb', line 48

def initialize(input_stream, delimiter: nil, buffer_size: 65_536, embedded_within: nil)
  @embedded_within = embedded_within
  @input_stream    = input_stream
  @buffer_size     = buffer_size

  # More efficient read buffering only supported when the input stream `#read` method supports it.
  @use_read_cache_buffer = !@input_stream.method(:read).arity.between?(0, 1)

  @line_number       = 0
  @eof               = false
  @read_cache_buffer = nil
  @buffer            = nil
  @delimiter         = delimiter

  read_block
  # Auto-detect windows/linux line endings if not supplied. \n or \r\n
  @delimiter ||= auto_detect_line_endings

  if @buffer
    # Change the delimiters encoding to match that of the input stream
    @delimiter      = @delimiter.encode(@buffer.encoding)
    @delimiter_size = @delimiter.size
  end
end

Instance Attribute Details

#buffer_sizeObject (readonly)

Returns the value of attribute buffer_size.



4
5
6
# File 'lib/io_streams/line/reader.rb', line 4

def buffer_size
  @buffer_size
end

#delimiterObject (readonly)

Returns the value of attribute delimiter.



4
5
6
# File 'lib/io_streams/line/reader.rb', line 4

def delimiter
  @delimiter
end

#line_numberObject (readonly)

Returns the value of attribute line_number.



4
5
6
# File 'lib/io_streams/line/reader.rb', line 4

def line_number
  @line_number
end

Class Method Details

.open(file_name_or_io, **args) ⇒ Object

Read a line at a time from a file or stream



12
13
14
15
16
17
18
# File 'lib/io_streams/line/reader.rb', line 12

def self.open(file_name_or_io, **args)
  if file_name_or_io.is_a?(String)
    IOStreams::File::Reader.open(file_name_or_io) { |io| yield new(io, **args) }
  else
    yield new(file_name_or_io, **args)
  end
end

Instance Method Details

#eachObject

Iterate over every line in the file/stream passing each line to supplied block in turn. Returns [Integer] the number of lines read from the file/stream. Note:

  • The line delimiter is not returned.



77
78
79
80
81
82
83
84
85
86
87
# File 'lib/io_streams/line/reader.rb', line 77

def each
  line_count = 0
  until eof?
    line = readline
    unless line.nil?
      yield(line)
      line_count += 1
    end
  end
  line_count
end

#eof?Boolean

Returns whether the end of file has been reached for this stream

Returns:

  • (Boolean)


105
106
107
# File 'lib/io_streams/line/reader.rb', line 105

def eof?
  @eof && (@buffer.nil? || @buffer.empty?)
end

#readlineObject

Reads each line per the @delimeter. It will account for embedded lines provided they are within double quotes. The embedded_within argument is set in IOStreams::LineReader



91
92
93
94
95
96
97
98
99
100
101
102
# File 'lib/io_streams/line/reader.rb', line 91

def readline
  line = _readline
  if line && @embedded_within
    initial_line_number = @line_number
    while line.count(@embedded_within).odd?
      raise "Unclosed quoted field on line #{initial_line_number}" if eof? || line.length > @buffer_size * 10
      line << @delimiter
      line << _readline
    end
  end
  line
end