Class: Importu::Sources::CSV

Inherits:
Object
  • Object
show all
Defined in:
lib/importu/sources/csv.rb

Overview

Parses CSV files as import source data.

Each row becomes a hash with header names as keys. The CSV must have a header row.

Examples:

Basic usage

source = Importu::Sources::CSV.new("data.csv")
source.rows.each { |row| puts row["name"] }

From a string

csv_data = "name,email\nAlice,[email protected]"
source = Importu::Sources::CSV.new(StringIO.new(csv_data))

With semicolon delimiter

source = Importu::Sources::CSV.new("data.csv", csv_options: { col_sep: ";" })

With tab delimiter

source = Importu::Sources::CSV.new("data.tsv", csv_options: { col_sep: "\t" })

Common csv_options

csv_options: {
  col_sep: ";",        # Column separator (default: ",")
  quote_char: "'",     # Quote character (default: '"')
  encoding: "UTF-8",   # File encoding
}

See Also:

Instance Method Summary collapse

Constructor Details

#initialize(infile, csv_options: {}) ⇒ CSV

Creates a new CSV source.

Parameters:

  • infile (String, IO)

    file path or IO object to read from

  • csv_options (Hash) (defaults to: {})

    options passed to Ruby’s CSV parser

Raises:



42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# File 'lib/importu/sources/csv.rb', line 42

def initialize(infile, csv_options: {}, **)
  @owns_handle = !infile.respond_to?(:readline)
  @infile = @owns_handle ? File.open(infile, "rb") : infile

  @csv_options = {
    headers:        true,
    return_headers: true,
    write_headers:  true,
    skip_blanks:    true,
  }.merge(csv_options)

  begin
    @reader = ::CSV.new(@infile, **@csv_options)
    @header = @reader.readline
  rescue CSV::MalformedCSVError => e
    raise Importu::InvalidInput, e.message
  end

  if @header.nil?
    raise Importu::InvalidInput, "Empty document"
  end
rescue StandardError
  close
  raise
end

Instance Method Details

#closevoid

This method returns an undefined value.

Closes the underlying file handle if opened by this source.

Safe to call multiple times. Only closes handles that were opened by this source (not IO objects passed in).



74
75
76
77
# File 'lib/importu/sources/csv.rb', line 74

def close
  return unless @owns_handle && @infile && !@infile.closed?
  @infile.close
end

#rowsEnumerator<Hash>

Returns an enumerator that yields each row as a hash.

Returns:

  • (Enumerator<Hash>)

    rows with header names as keys



82
83
84
85
86
87
88
# File 'lib/importu/sources/csv.rb', line 82

def rows
  @infile.rewind
  reader = ::CSV.new(@infile, **@csv_options)
  Enumerator.new do |yielder|
    reader.each {|row| yielder.yield(row.to_hash) unless row.header_row? }
  end
end

#write_errors(summary, only_errors: false) ⇒ Tempfile?

Generates a CSV file with error information appended.

Creates a copy of the original data with an “_errors” column containing any validation errors for each row. Useful for returning to data providers.

Parameters:

  • summary (Importu::Summary)

    the import summary containing errors

  • only_errors (Boolean) (defaults to: false)

    if true, only include rows that had errors

Returns:

  • (Tempfile, nil)

    temp file with error data, or nil if no errors



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
# File 'lib/importu/sources/csv.rb', line 98

def write_errors(summary, only_errors: false)
  return unless summary.itemized_errors.any?

  header = @header.fields | ["_errors"]
  itemized_errors = summary.itemized_errors

  Tempfile.new("import").tap do |file|
    writer = CSV.new(file, **@csv_options)
    writer << header

    rows.each.with_index do |row, index|
      errors = itemized_errors.key?(index) \
        ? itemized_errors[index].join(", ")
        : nil

      if errors || !only_errors
        writer << row.merge("_errors" => errors).values_at(*header)
      end
    end

    file.rewind
  end
end