Class: IOStreams::Line::Reader

Inherits:
Reader
  • Object
show all
Defined in:
lib/io_streams/line/reader.rb

Constant Summary collapse

MAX_BLOCKS_MULTIPLIER =

Prevent denial of service when a delimiter is not found before this number * ‘buffer_size` characters are read.

100
LINEFEED_REGEXP =
Regexp.compile(/\r\n|\n|\r/).freeze

Instance Attribute Summary collapse

Attributes inherited from Reader

#input_stream

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Reader

file, open

Constructor Details

#initialize(input_stream, delimiter: nil, buffer_size: 65_536, embedded_within: nil, original_file_name: nil) ⇒ Reader

Create a delimited stream reader from the supplied input stream.

Lines returned will be in the encoding of the input stream. To change the encoding of returned lines, use IOStreams::Encode::Reader.

Parameters

input_stream
  The input stream that implements #read

delimiter: [String]
  Line / Record delimiter to use to break the stream up into records
    Any string to break the stream up by.
    This delimiter is removed from each line when `#each` or `#readline` is called.
  Default: nil
    Automatically detect line endings and break up by line
    Searches for the first "\r\n" or "\n" and then uses that as the
    delimiter for all subsequent records.

buffer_size: [Integer]
  Size of blocks to read from the input stream at a time.
  Default: 65536 ( 64K )

TODO:

  • Handle embedded line feeds when reading csv files.

  • Skip Comment lines. RegExp?

  • Skip “empty” / “blank” lines. RegExp?

  • Extract header line(s) / first non-comment, non-blank line

  • Embedded newline support, RegExp? or Proc?



47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# File 'lib/io_streams/line/reader.rb', line 47

def initialize(input_stream, delimiter: nil, buffer_size: 65_536, embedded_within: nil, original_file_name: nil)
  super(input_stream)

  @embedded_within = embedded_within
  @buffer_size     = buffer_size

  # More efficient read buffering only supported when the input stream `#read` method supports it.
  @use_read_cache_buffer = !@input_stream.method(:read).arity.between?(0, 1)

  @line_number       = 0
  @eof               = false
  @read_cache_buffer = nil
  @buffer            = nil
  @delimiter         = delimiter

  read_block
  # Auto-detect windows/linux line endings if not supplied. \n or \r\n
  @delimiter ||= auto_detect_line_endings

  return unless @buffer

  # Change the delimiters encoding to match that of the input stream
  @delimiter      = @delimiter.encode(@buffer.encoding)
  @delimiter_size = @delimiter.size
end

Instance Attribute Details

#buffer_sizeObject (readonly)

Returns the value of attribute buffer_size.



4
5
6
# File 'lib/io_streams/line/reader.rb', line 4

def buffer_size
  @buffer_size
end

#delimiterObject (readonly)

Returns the value of attribute delimiter.



4
5
6
# File 'lib/io_streams/line/reader.rb', line 4

def delimiter
  @delimiter
end

#line_numberObject (readonly)

Returns the value of attribute line_number.



4
5
6
# File 'lib/io_streams/line/reader.rb', line 4

def line_number
  @line_number
end

Class Method Details

.stream(input_stream, **args) {|new(input_stream, **args)| ... } ⇒ Object

Read a line at a time from a stream

Yields:



12
13
14
15
16
17
# File 'lib/io_streams/line/reader.rb', line 12

def self.stream(input_stream, **args)
  # Pass-through if already a line reader
  return yield(input_stream) if input_stream.is_a?(self.class)

  yield new(input_stream, **args)
end

Instance Method Details

#eachObject

Iterate over every line in the file/stream passing each line to supplied block in turn. Returns [Integer] the number of lines read from the file/stream. Note:

  • The line delimiter is not returned.



77
78
79
80
81
82
83
84
85
86
87
# File 'lib/io_streams/line/reader.rb', line 77

def each
  line_count = 0
  until eof?
    line = readline
    unless line.nil?
      yield(line)
      line_count += 1
    end
  end
  line_count
end

#eof?Boolean

Returns whether the end of file has been reached for this stream

Returns:

  • (Boolean)


106
107
108
# File 'lib/io_streams/line/reader.rb', line 106

def eof?
  @eof && (@buffer.nil? || @buffer.empty?)
end

#readlineObject

Reads each line per the @delimeter. It will account for embedded lines provided they are within double quotes. The embedded_within argument is set in IOStreams::LineReader



91
92
93
94
95
96
97
98
99
100
101
102
103
# File 'lib/io_streams/line/reader.rb', line 91

def readline
  line = _readline
  if line && @embedded_within
    initial_line_number = @line_number
    while line.count(@embedded_within).odd?
      raise "Unclosed quoted field on line #{initial_line_number}" if eof? || line.length > @buffer_size * 10

      line << @delimiter
      line << _readline
    end
  end
  line
end