Class: IOStreams::Line::Reader
- Defined in:
- lib/io_streams/line/reader.rb
Constant Summary collapse
- MAX_BLOCKS_MULTIPLIER =
Prevent denial of service when a delimiter is not found before this number * ‘buffer_size` characters are read.
100
- LINEFEED_REGEXP =
Regexp.compile(/\r\n|\n|\r/).freeze
Instance Attribute Summary collapse
-
#buffer_size ⇒ Object
readonly
Returns the value of attribute buffer_size.
-
#delimiter ⇒ Object
readonly
Returns the value of attribute delimiter.
-
#line_number ⇒ Object
readonly
Returns the value of attribute line_number.
Attributes inherited from Reader
Class Method Summary collapse
-
.stream(input_stream, **args) {|new(input_stream, **args)| ... } ⇒ Object
Read a line at a time from a stream.
Instance Method Summary collapse
-
#each ⇒ Object
Iterate over every line in the file/stream passing each line to supplied block in turn.
-
#eof? ⇒ Boolean
Returns whether the end of file has been reached for this stream.
-
#initialize(input_stream, delimiter: nil, buffer_size: 65_536, embedded_within: nil, original_file_name: nil) ⇒ Reader
constructor
Create a delimited stream reader from the supplied input stream.
-
#readline ⇒ Object
Reads each line per the ‘delimeter`.
Methods inherited from Reader
Constructor Details
#initialize(input_stream, delimiter: nil, buffer_size: 65_536, embedded_within: nil, original_file_name: nil) ⇒ Reader
Create a delimited stream reader from the supplied input stream.
Lines returned will be in the encoding of the input stream. To change the encoding of returned lines, use IOStreams::Encode::Reader.
Parameters
input_stream
The input stream that implements #read
delimiter: [String]
Line / Record delimiter to use to break the stream up into records
Any string to break the stream up by.
This delimiter is removed from each line when `#each` or `#readline` is called.
Default: nil
Automatically detect line endings and break up by line
Searches for the first "\r\n" or "\n" and then uses that as the
delimiter for all subsequent records.
buffer_size: [Integer]
Size of blocks to read from the input stream at a time.
Default: 65536 ( 64K )
embedded_within: [String]
Supports CSV files where a line may contain an embedded newline.
For CSV files set `embedded_within: '"'`
Note:
-
When using a line reader and the file_name ends with “.csv” then embedded_within is automatically set to ‘“`
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
# File 'lib/io_streams/line/reader.rb', line 47 def initialize(input_stream, delimiter: nil, buffer_size: 65_536, embedded_within: nil, original_file_name: nil) super(input_stream) @embedded_within = @buffer_size = buffer_size # More efficient read buffering only supported when the input stream `#read` method supports it. @use_read_cache_buffer = !@input_stream.method(:read).arity.between?(0, 1) @line_number = 0 @eof = false @read_cache_buffer = nil @buffer = nil @delimiter = delimiter read_block # Auto-detect windows/linux line endings if not supplied. \n or \r\n @delimiter ||= auto_detect_line_endings return unless @buffer # Change the delimiters encoding to match that of the input stream @delimiter = @delimiter.encode(@buffer.encoding) @delimiter_size = @delimiter.size end |
Instance Attribute Details
#buffer_size ⇒ Object (readonly)
Returns the value of attribute buffer_size.
4 5 6 |
# File 'lib/io_streams/line/reader.rb', line 4 def buffer_size @buffer_size end |
#delimiter ⇒ Object (readonly)
Returns the value of attribute delimiter.
4 5 6 |
# File 'lib/io_streams/line/reader.rb', line 4 def delimiter @delimiter end |
#line_number ⇒ Object (readonly)
Returns the value of attribute line_number.
4 5 6 |
# File 'lib/io_streams/line/reader.rb', line 4 def line_number @line_number end |
Class Method Details
.stream(input_stream, **args) {|new(input_stream, **args)| ... } ⇒ Object
Read a line at a time from a stream
12 13 14 15 16 17 |
# File 'lib/io_streams/line/reader.rb', line 12 def self.stream(input_stream, **args) # Pass-through if already a line reader return yield(input_stream) if input_stream.is_a?(self.class) yield new(input_stream, **args) end |
Instance Method Details
#each ⇒ Object
Iterate over every line in the file/stream passing each line to supplied block in turn. Returns [Integer] the number of lines read from the file/stream. Note:
-
The line delimiter is not returned.
77 78 79 80 81 82 83 84 85 86 87 |
# File 'lib/io_streams/line/reader.rb', line 77 def each line_count = 0 until eof? line = readline unless line.nil? yield(line) line_count += 1 end end line_count end |
#eof? ⇒ Boolean
Returns whether the end of file has been reached for this stream
118 119 120 |
# File 'lib/io_streams/line/reader.rb', line 118 def eof? @eof && (@buffer.nil? || @buffer.empty?) end |
#readline ⇒ Object
Reads each line per the ‘delimeter`. Accounts for lines that contain the `delimiter` when the `delimeter` is within the `embedded_within` delimiter. For Example, CSV files can contain newlines embedded within double quotes.
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
# File 'lib/io_streams/line/reader.rb', line 92 def readline line = _readline if line && @embedded_within initial_line_number = @line_number while line.count(@embedded_within).odd? if eof? || line.length > @buffer_size * 10 raise(Errors::MalformedDataError.new( "Unbalanced delimited field, delimiter: #{@embedded_within}", initial_line_number )) end line << @delimiter next_line = _readline if next_line.nil? raise(Errors::MalformedDataError.new( "Unbalanced delimited field, delimiter: #{@embedded_within}", initial_line_number )) end line << next_line end end line end |