Class: IOStreams::Tabular::Utility::CSVRow

Inherits:
CSV
  • Object
show all
Defined in:
lib/io_streams/tabular/utility/csv_row.rb

Overview

For parsing a single line of CSV at a time 2 to 3 times better performance than CSV.parse_line and considerably less garbage collection required.

Note:

This parser does not support line feeds embedded in quoted fields since
the file is broken apart based on line feeds during the upload process and
is then processed by each worker on a line by line basis.

Constant Summary collapse

UTF8_ENCODING =
Encoding.find("UTF-8").freeze

Instance Method Summary collapse

Constructor Details

#initialize(encoding = UTF8_ENCODING) ⇒ CSVRow

Returns a new instance of CSVRow.



16
17
18
19
# File 'lib/io_streams/tabular/utility/csv_row.rb', line 16

def initialize(encoding = UTF8_ENCODING)
  @io = StringIO.new("".force_encoding(encoding))
  super(@io, row_sep: "")
end

Instance Method Details

#parse(line) ⇒ Object

Parse a single line of CSV data Parameters

line [String]
  A single line of CSV data without any line terminators

Raises:

  • (MalformedCSVError)


25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# File 'lib/io_streams/tabular/utility/csv_row.rb', line 25

def parse(line)
  return if IOStreams::Utils.blank?(line)
  return if @skip_lines&.match(line)

  in_extended_col = false
  csv             = []
  parts           = line.split(@col_sep, -1)
  csv << nil if parts.empty?

  # This loop is the hot path of csv parsing. Some things may be non-dry
  # for a reason. Make sure to benchmark when refactoring.
  parts.each do |part|
    if in_extended_col
      # If we are continuing a previous column
      if part[-1] == @quote_char && part.count(@quote_char).odd?
        # extended column ends
        csv.last << part[0..-2]
        raise MalformedCSVError, "Missing or stray quote in line #{lineno + 1}" if csv.last =~ @parsers[:stray_quote]

        csv.last.gsub!(@quote_char * 2, @quote_char)
        in_extended_col = false
      else
        csv.last << part
        csv.last << @col_sep
      end
    elsif part[0] == @quote_char
      # If we are starting a new quoted column
      if part[-1] != @quote_char || part.count(@quote_char).odd?
        # start an extended column
        csv << part[1..-1]
        csv.last << @col_sep
        in_extended_col = true
      else
        # regular quoted column
        csv << part[1..-2]
        raise MalformedCSVError, "Missing or stray quote in line #{lineno + 1}" if csv.last =~ @parsers[:stray_quote]

        csv.last.gsub!(@quote_char * 2, @quote_char)
      end
    elsif part =~ @parsers[:quote_or_nl]
      # Unquoted field with bad characters.
      if part =~ @parsers[:nl_or_lf]
        raise MalformedCSVError, "Unquoted fields do not allow \\r or \\n (line #{lineno + 1})."
      else
        raise MalformedCSVError, "Illegal quoting in line #{lineno + 1}."
      end
    else
      # Regular ole unquoted field.
      csv << (part.empty? ? nil : part)
    end
  end

  # Replace tacked on @col_sep with @row_sep if we are still in an extended
  # column.
  csv[-1][-1] = @row_sep if in_extended_col

  raise MalformedCSVError, "Unclosed quoted field on line #{lineno + 1}." if in_extended_col

  @lineno += 1

  # save fields unconverted fields, if needed...
  unconverted = csv.dup if @unconverted_fields

  # convert fields, if needed...
  csv         = convert_fields(csv) unless @use_headers || @converters.empty?
  # parse out header rows and handle CSV::Row conversions...
  csv         = parse_headers(csv) if @use_headers

  # inject unconverted fields and accessor, if requested...
  add_unconverted_fields(csv, unconverted) if @unconverted_fields && (!csv.respond_to? :unconverted_fields)

  csv
end

#render(row) ⇒ Object Also known as: to_csv

Return the supplied array as a single line CSV string.



100
101
102
# File 'lib/io_streams/tabular/utility/csv_row.rb', line 100

def render(row)
  row.map(&@quote).join(@col_sep) + @row_sep # quote and separate
end