Class: IOStreams::Line::Reader
- Defined in:
- lib/io_streams/line/reader.rb
Constant Summary collapse
- MAX_BLOCKS_MULTIPLIER =
Prevent denial of service when a delimiter is not found before this number * ‘buffer_size` characters are read.
100
- LINEFEED_REGEXP =
Regexp.compile(/\r\n|\n|\r/).freeze
Instance Attribute Summary collapse
-
#buffer_size ⇒ Object
readonly
Returns the value of attribute buffer_size.
-
#delimiter ⇒ Object
readonly
Returns the value of attribute delimiter.
-
#line_number ⇒ Object
readonly
Returns the value of attribute line_number.
Attributes inherited from Reader
Class Method Summary collapse
-
.stream(input_stream, **args) {|new(input_stream, **args)| ... } ⇒ Object
Read a line at a time from a stream.
Instance Method Summary collapse
-
#each ⇒ Object
Iterate over every line in the file/stream passing each line to supplied block in turn.
-
#eof? ⇒ Boolean
Returns whether the end of file has been reached for this stream.
-
#initialize(input_stream, delimiter: nil, buffer_size: 65_536, embedded_within: nil, original_file_name: nil) ⇒ Reader
constructor
Create a delimited stream reader from the supplied input stream.
-
#readline ⇒ Object
Reads each line per the @delimeter.
Methods inherited from Reader
Constructor Details
#initialize(input_stream, delimiter: nil, buffer_size: 65_536, embedded_within: nil, original_file_name: nil) ⇒ Reader
Create a delimited stream reader from the supplied input stream.
Lines returned will be in the encoding of the input stream. To change the encoding of returned lines, use IOStreams::Encode::Reader.
Parameters
input_stream
The input stream that implements #read
delimiter: [String]
Line / Record delimiter to use to break the stream up into records
Any string to break the stream up by.
This delimiter is removed from each line when `#each` or `#readline` is called.
Default: nil
Automatically detect line endings and break up by line
Searches for the first "\r\n" or "\n" and then uses that as the
delimiter for all subsequent records.
buffer_size: [Integer]
Size of blocks to read from the input stream at a time.
Default: 65536 ( 64K )
TODO:
-
Handle embedded line feeds when reading csv files.
-
Skip Comment lines. RegExp?
-
Skip “empty” / “blank” lines. RegExp?
-
Extract header line(s) / first non-comment, non-blank line
-
Embedded newline support, RegExp? or Proc?
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
# File 'lib/io_streams/line/reader.rb', line 47 def initialize(input_stream, delimiter: nil, buffer_size: 65_536, embedded_within: nil, original_file_name: nil) super(input_stream) @embedded_within = @buffer_size = buffer_size # More efficient read buffering only supported when the input stream `#read` method supports it. @use_read_cache_buffer = !@input_stream.method(:read).arity.between?(0, 1) @line_number = 0 @eof = false @read_cache_buffer = nil @buffer = nil @delimiter = delimiter read_block # Auto-detect windows/linux line endings if not supplied. \n or \r\n @delimiter ||= auto_detect_line_endings return unless @buffer # Change the delimiters encoding to match that of the input stream @delimiter = @delimiter.encode(@buffer.encoding) @delimiter_size = @delimiter.size end |
Instance Attribute Details
#buffer_size ⇒ Object (readonly)
Returns the value of attribute buffer_size.
4 5 6 |
# File 'lib/io_streams/line/reader.rb', line 4 def buffer_size @buffer_size end |
#delimiter ⇒ Object (readonly)
Returns the value of attribute delimiter.
4 5 6 |
# File 'lib/io_streams/line/reader.rb', line 4 def delimiter @delimiter end |
#line_number ⇒ Object (readonly)
Returns the value of attribute line_number.
4 5 6 |
# File 'lib/io_streams/line/reader.rb', line 4 def line_number @line_number end |
Class Method Details
.stream(input_stream, **args) {|new(input_stream, **args)| ... } ⇒ Object
Read a line at a time from a stream
12 13 14 15 16 17 |
# File 'lib/io_streams/line/reader.rb', line 12 def self.stream(input_stream, **args) # Pass-through if already a line reader return yield(input_stream) if input_stream.is_a?(self.class) yield new(input_stream, **args) end |
Instance Method Details
#each ⇒ Object
Iterate over every line in the file/stream passing each line to supplied block in turn. Returns [Integer] the number of lines read from the file/stream. Note:
-
The line delimiter is not returned.
77 78 79 80 81 82 83 84 85 86 87 |
# File 'lib/io_streams/line/reader.rb', line 77 def each line_count = 0 until eof? line = readline unless line.nil? yield(line) line_count += 1 end end line_count end |
#eof? ⇒ Boolean
Returns whether the end of file has been reached for this stream
106 107 108 |
# File 'lib/io_streams/line/reader.rb', line 106 def eof? @eof && (@buffer.nil? || @buffer.empty?) end |
#readline ⇒ Object
Reads each line per the @delimeter. It will account for embedded lines provided they are within double quotes. The embedded_within argument is set in IOStreams::LineReader
91 92 93 94 95 96 97 98 99 100 101 102 103 |
# File 'lib/io_streams/line/reader.rb', line 91 def readline line = _readline if line && @embedded_within initial_line_number = @line_number while line.count(@embedded_within).odd? raise "Unclosed quoted field on line #{initial_line_number}" if eof? || line.length > @buffer_size * 10 line << @delimiter line << _readline end end line end |