Class: IOStreams::Line::Reader
- Inherits:
-
Object
- Object
- IOStreams::Line::Reader
- Defined in:
- lib/io_streams/line/reader.rb
Constant Summary collapse
- MAX_BLOCKS_MULTIPLIER =
Prevent denial of service when a delimiter is not found before this number * ‘buffer_size` characters are read.
100
- LINEFEED_REGEXP =
Regexp.compile(/\r\n|\n|\r/).freeze
Instance Attribute Summary collapse
-
#buffer_size ⇒ Object
readonly
Returns the value of attribute buffer_size.
-
#delimiter ⇒ Object
readonly
Returns the value of attribute delimiter.
-
#line_number ⇒ Object
readonly
Returns the value of attribute line_number.
Class Method Summary collapse
-
.open(file_name_or_io, **args) ⇒ Object
Read a line at a time from a file or stream.
Instance Method Summary collapse
-
#each ⇒ Object
Iterate over every line in the file/stream passing each line to supplied block in turn.
-
#eof? ⇒ Boolean
Returns whether the end of file has been reached for this stream.
-
#initialize(input_stream, delimiter: nil, buffer_size: 65_536, embedded_within: nil) ⇒ Reader
constructor
Create a delimited stream reader from the supplied input stream.
-
#readline ⇒ Object
Reads each line per the @delimeter.
Constructor Details
#initialize(input_stream, delimiter: nil, buffer_size: 65_536, embedded_within: nil) ⇒ Reader
Create a delimited stream reader from the supplied input stream.
Lines returned will be in the encoding of the input stream. To change the encoding of returned lines, use IOStreams::Encode::Reader.
Parameters
input_stream
The input stream that implements #read
delimiter: [String]
Line / Record delimiter to use to break the stream up into records
Any string to break the stream up by.
This delimiter is removed from each line when `#each` or `#readline` is called.
Default: nil
Automatically detect line endings and break up by line
Searches for the first "\r\n" or "\n" and then uses that as the
delimiter for all subsequent records.
buffer_size: [Integer]
Size of blocks to read from the input stream at a time.
Default: 65536 ( 64K )
TODO:
-
Handle embedded line feeds when reading csv files.
-
Skip Comment lines. RegExp?
-
Skip “empty” / “blank” lines. RegExp?
-
Extract header line(s) / first non-comment, non-blank line
-
Embedded newline support, RegExp? or Proc?
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
# File 'lib/io_streams/line/reader.rb', line 48 def initialize(input_stream, delimiter: nil, buffer_size: 65_536, embedded_within: nil) @embedded_within = @input_stream = input_stream @buffer_size = buffer_size # More efficient read buffering only supported when the input stream `#read` method supports it. @use_read_cache_buffer = !@input_stream.method(:read).arity.between?(0, 1) @line_number = 0 @eof = false @read_cache_buffer = nil @buffer = nil read_block # Auto-detect windows/linux line endings if not supplied. \n or \r\n @delimiter = delimiter || auto_detect_line_endings if @buffer # Change the delimiters encoding to match that of the input stream @delimiter = @delimiter.encode(@buffer.encoding) @delimiter_size = @delimiter.size end end |
Instance Attribute Details
#buffer_size ⇒ Object (readonly)
Returns the value of attribute buffer_size.
4 5 6 |
# File 'lib/io_streams/line/reader.rb', line 4 def buffer_size @buffer_size end |
#delimiter ⇒ Object (readonly)
Returns the value of attribute delimiter.
4 5 6 |
# File 'lib/io_streams/line/reader.rb', line 4 def delimiter @delimiter end |
#line_number ⇒ Object (readonly)
Returns the value of attribute line_number.
4 5 6 |
# File 'lib/io_streams/line/reader.rb', line 4 def line_number @line_number end |
Class Method Details
.open(file_name_or_io, **args) ⇒ Object
Read a line at a time from a file or stream
12 13 14 15 16 17 18 |
# File 'lib/io_streams/line/reader.rb', line 12 def self.open(file_name_or_io, **args) if file_name_or_io.is_a?(String) IOStreams::File::Reader.open(file_name_or_io) { |io| yield new(io, **args) } else yield new(file_name_or_io, **args) end end |
Instance Method Details
#each ⇒ Object
Iterate over every line in the file/stream passing each line to supplied block in turn. Returns [Integer] the number of lines read from the file/stream. Note:
-
The line delimiter is not returned.
76 77 78 79 80 81 82 83 84 85 86 |
# File 'lib/io_streams/line/reader.rb', line 76 def each line_count = 0 until eof? line = readline unless line.nil? yield(line) line_count += 1 end end line_count end |
#eof? ⇒ Boolean
Returns whether the end of file has been reached for this stream
104 105 106 |
# File 'lib/io_streams/line/reader.rb', line 104 def eof? @eof && (@buffer.nil? || @buffer.empty?) end |
#readline ⇒ Object
Reads each line per the @delimeter. It will account for embedded lines provided they are within double quotes. The embedded_within argument is set in IOStreams::LineReader
90 91 92 93 94 95 96 97 98 99 100 101 |
# File 'lib/io_streams/line/reader.rb', line 90 def readline line = _readline if line && @embedded_within initial_line_number = @line_number while line.count(@embedded_within).odd? raise "Unclosed quoted field on line #{initial_line_number}" if eof? || line.length > @buffer_size * 10 line << @delimiter line << _readline end end line end |