Class: IOStreams::Line::Reader

Inherits:
Reader
  • Object
show all
Defined in:
lib/io_streams/line/reader.rb

Constant Summary collapse

MAX_BLOCKS_MULTIPLIER =

Prevent denial of service when a delimiter is not found before this number * ‘buffer_size` characters are read.

100
LINEFEED_REGEXP =
Regexp.compile(/\r\n|\n|\r/).freeze

Instance Attribute Summary collapse

Attributes inherited from Reader

#input_stream

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Reader

file, open

Constructor Details

#initialize(input_stream, delimiter: nil, buffer_size: 65_536, embedded_within: nil, original_file_name: nil) ⇒ Reader

Create a delimited stream reader from the supplied input stream.

Lines returned will be in the encoding of the input stream. To change the encoding of returned lines, use IOStreams::Encode::Reader.

Parameters

input_stream
  The input stream that implements #read

delimiter: [String]
  Line / Record delimiter to use to break the stream up into records
    Any string to break the stream up by.
    This delimiter is removed from each line when `#each` or `#readline` is called.
  Default: nil
    Automatically detect line endings and break up by line
    Searches for the first "\r\n" or "\n" and then uses that as the
    delimiter for all subsequent records.

buffer_size: [Integer]
  Size of blocks to read from the input stream at a time.
  Default: 65536 ( 64K )

embedded_within: [String]
  Supports CSV files where a line may contain an embedded newline.
  For CSV files set `embedded_within: '"'`

Note:

  • When using a line reader and the file_name ends with “.csv” then embedded_within is automatically set to ‘“`



47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# File 'lib/io_streams/line/reader.rb', line 47

def initialize(input_stream, delimiter: nil, buffer_size: 65_536, embedded_within: nil, original_file_name: nil)
  super(input_stream)

  @embedded_within = embedded_within
  @buffer_size     = buffer_size

  # More efficient read buffering only supported when the input stream `#read` method supports it.
  @use_read_cache_buffer = !@input_stream.method(:read).arity.between?(0, 1)

  @line_number       = 0
  @eof               = false
  @read_cache_buffer = nil
  @buffer            = nil
  @delimiter         = delimiter

  read_block
  # Auto-detect windows/linux line endings if not supplied. \n or \r\n
  @delimiter ||= auto_detect_line_endings

  return unless @buffer

  # Change the delimiters encoding to match that of the input stream
  @delimiter      = @delimiter.encode(@buffer.encoding)
  @delimiter_size = @delimiter.size
end

Instance Attribute Details

#buffer_sizeObject (readonly)

Returns the value of attribute buffer_size.



4
5
6
# File 'lib/io_streams/line/reader.rb', line 4

def buffer_size
  @buffer_size
end

#delimiterObject (readonly)

Returns the value of attribute delimiter.



4
5
6
# File 'lib/io_streams/line/reader.rb', line 4

def delimiter
  @delimiter
end

#line_numberObject (readonly)

Returns the value of attribute line_number.



4
5
6
# File 'lib/io_streams/line/reader.rb', line 4

def line_number
  @line_number
end

Class Method Details

.stream(input_stream, **args) {|new(input_stream, **args)| ... } ⇒ Object

Read a line at a time from a stream

Yields:



12
13
14
15
16
17
# File 'lib/io_streams/line/reader.rb', line 12

def self.stream(input_stream, **args)
  # Pass-through if already a line reader
  return yield(input_stream) if input_stream.is_a?(self.class)

  yield new(input_stream, **args)
end

Instance Method Details

#eachObject

Iterate over every line in the file/stream passing each line to supplied block in turn. Returns [Integer] the number of lines read from the file/stream. Note:

  • The line delimiter is not returned.



77
78
79
80
81
82
83
84
85
86
87
# File 'lib/io_streams/line/reader.rb', line 77

def each
  line_count = 0
  until eof?
    line = readline
    unless line.nil?
      yield(line)
      line_count += 1
    end
  end
  line_count
end

#eof?Boolean

Returns whether the end of file has been reached for this stream

Returns:

  • (Boolean)


118
119
120
# File 'lib/io_streams/line/reader.rb', line 118

def eof?
  @eof && (@buffer.nil? || @buffer.empty?)
end

#readlineObject

Reads each line per the ‘delimeter`. Accounts for lines that contain the `delimiter` when the `delimeter` is within the `embedded_within` delimiter. For Example, CSV files can contain newlines embedded within double quotes.



92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# File 'lib/io_streams/line/reader.rb', line 92

def readline
  line = _readline
  if line && @embedded_within
    initial_line_number = @line_number
    while line.count(@embedded_within).odd?
      if eof? || line.length > @buffer_size * 10
        raise(Errors::MalformedDataError.new(
                "Unbalanced delimited field, delimiter: #{@embedded_within}",
                initial_line_number
              ))
      end
      line << @delimiter
      next_line = _readline
      if next_line.nil?
        raise(Errors::MalformedDataError.new(
                "Unbalanced delimited field, delimiter: #{@embedded_within}",
                initial_line_number
              ))
      end
      line << next_line
    end
  end
  line
end