Class: CSVDiff::CSVSource

Inherits:
Source
  • Object
show all
Defined in:
lib/csv-diff/csv_source.rb

Overview

Represents a CSV input (i.e. the left/from or right/to input) to the diff process.

Instance Attribute Summary

Attributes inherited from Source

#case_sensitive, #child_field_indexes, #child_fields, #data, #dup_count, #field_names, #index, #key_field_indexes, #key_fields, #line_count, #lines, #parent_field_indexes, #parent_fields, #path, #skip_count, #trim_whitespace, #warnings

Instance Method Summary collapse

Methods inherited from Source

#[], #index_source, #path?, #save_csv, #to_hash

Constructor Details

#initialize(source, options = {}) ⇒ CSVSource

Creates a new diff source.

A diff source must contain at least one field that will be used as the key to identify the same record in a different version of this file. If not specified via one of the options, the first field is assumed to be the unique key.

If multiple fields combine to form a unique key, the parent is assumed to be identified by all but the last field of the unique key. If finer control is required, use a combination of the :parent_fields and :child_fields options.

All key options can be specified either by field name, or by field index (0 based).

Parameters:

  • source (String|Array<Array>)

    Either a path to a CSV file, or an Array of Arrays containing CSV data. If the :field_names option is not specified, the first line must contain the names of the fields.

  • options (Hash) (defaults to: {})

    An options hash.

Options Hash (options):

  • :encoding (String)

    The encoding to use when opening the CSV file.

  • :csv_options (Hash)

    Any options you wish to pass to CSV.open, e.g. :col_sep.

  • :field_names (Array<String>)

    The names of each of the fields in source.

  • :ignore_header (Boolean)

    If true, and :field_names has been specified, then the first row of the file is ignored.

  • :key_field (String)

    The name of the field that uniquely identifies each row.

  • :key_fields (Array<String>)

    The names of the fields that uniquely identifies each row.

  • :parent_field (String)

    The name of the field(s) that identify a parent within which sibling order should be checked.

  • :child_field (String)

    The name of the field(s) that uniquely identify a child of a parent.

  • :case_sensitive (Boolean)

    If true (the default), keys are indexed as-is; if false, the index is built in upper-case for case-insensitive comparisons.

  • :include (Hash)

    A hash of field name(s) or index(es) to regular expression(s). Only source rows whose field values satisfy the regular expressions will be indexed and included in the diff process.

  • :exclude (Hash)

    A hash of field name(s) or index(es) to regular expression(s). Source rows with a field value that satisfies the regular expressions will be excluded from the diff process.



51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# File 'lib/csv-diff/csv_source.rb', line 51

def initialize(source, options = {})
    super(options)
    if source.is_a?(String)
        require 'csv'
        mode_string = options[:encoding] ? "r:#{options[:encoding]}" : 'r'
        csv_options = options.fetch(:csv_options, {})
        @path = source
        # When you call CSV.open, it's best to pass in a block so that after it's yielded,
        # the underlying file handle is closed. Otherwise, you risk leaking the handle.
        @data = CSV.open(@path, mode_string, csv_options) do |csv|
             csv.readlines
        end
    elsif source.is_a?(Enumerable) && source.size == 0 || (source.size > 0 && source.first.is_a?(Enumerable))
        @data = source
    else
        raise ArgumentError, "source must be a path to a file or an Enumerable<Enumerable>"
    end
    index_source
end