Class: CSVDiff::CSVSource

Inherits:
Object
  • Object
show all
Defined in:
lib/csv-diff/csv_source.rb

Overview

Represents a CSV input (i.e. the left/from or right/to input) to the diff process.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(source, options = {}) ⇒ CSVSource

Creates a new diff source.

A diff source must contain at least one field that will be used as the key to identify the same record in a different version of this file. If not specified via one of the options, the first field is assumed to be the unique key.

If multiple fields combine to form a unique key, the parent is assumed to be identified by all but the last field of the unique key. If finer control is required, use a combination of the :parent_fields and :child_fields options.

All key options can be specified either by field name, or by field index (0 based).

Parameters:

  • source (String|Array<Array>)

    Either a path to a CSV file, or an Array of Arrays containing CSV data. If the :field_names option is not specified, the first line must contain the names of the fields.

  • options (Hash) (defaults to: {})

    An options hash.

Options Hash (options):

  • :encoding (String)

    The encoding to use when opening the CSV file.

  • :csv_options (Hash)

    Any options you wish to pass to CSV.open, e.g. :col_sep.

  • :field_names (Array<String>)

    The names of each of the fields in source.

  • :ignore_header (Boolean)

    If true, and :field_names has been specified, then the first row of the file is ignored.

  • :key_field (String)

    The name of the field that uniquely identifies each row.

  • :key_fields (Array<String>)

    The names of the fields that uniquely identifies each row.

  • :parent_field (String)

    The name of the field(s) that identify a parent within which sibling order should be checked.

  • :child_field (String)

    The name of the field(s) that uniquely identify a child of a parent.

  • :case_sensitive (Boolean)

    If true (the default), keys are indexed as-is; if false, the index is built in upper-case for case-insensitive comparisons.

  • :include (Hash)

    A hash of field name(s) or index(es) to regular expression(s). Only source rows whose field values satisfy the regular expressions will be indexed and included in the diff process.

  • :exclude (Hash)

    A hash of field name(s) or index(es) to regular expression(s). Source rows with a field value that satisfies the regular expressions will be excluded from the diff process.



101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
# File 'lib/csv-diff/csv_source.rb', line 101

def initialize(source, options = {})
    if source.is_a?(String)
        require 'csv'
        mode_string = options[:encoding] ? "r:#{options[:encoding]}" : 'r'
        csv_options = options.fetch(:csv_options, {})
        @path = source
        source = CSV.open(@path, mode_string, csv_options).readlines
    elsif !source.is_a?(Enumerable) || (source.is_a?(Enumerable) && source.size > 0 &&
                                        !source.first.is_a?(Enumerable))
        raise ArgumentError, "source must be a path to a file or an Enumerable<Enumerable>"
    end
    if (options.keys & [:parent_field, :parent_fields, :child_field, :child_fields]).empty? &&
       (kf = options.fetch(:key_field, options[:key_fields]))
        @key_fields = [kf].flatten
        @parent_fields = @key_fields[0...-1]
        @child_fields = @key_fields[-1..-1]
    else
        @parent_fields = [options.fetch(:parent_field, options[:parent_fields]) || []].flatten
        @child_fields = [options.fetch(:child_field, options[:child_fields]) || [0]].flatten
        @key_fields = @parent_fields + @child_fields
    end
    @field_names = options[:field_names]
    @warnings = []
    index_source(source, options)
end

Instance Attribute Details

#case_sensitiveBoolean (readonly) Also known as: case_sensitive?

Returns True if the source has been indexed with case- sensitive keys, or false if it has been indexed using upper-case key values.

Returns:

  • (Boolean)

    True if the source has been indexed with case- sensitive keys, or false if it has been indexed using upper-case key values.



35
36
37
# File 'lib/csv-diff/csv_source.rb', line 35

def case_sensitive
  @case_sensitive
end

#child_field_indexesArray<Fixnum> (readonly)

Returns The indexes of the child fields in the source file.

Returns:

  • (Array<Fixnum>)

    The indexes of the child fields in the source file.



30
31
32
# File 'lib/csv-diff/csv_source.rb', line 30

def child_field_indexes
  @child_field_indexes
end

#child_fieldsArray<String> (readonly)

Returns The names of the field(s) that distinguish a child of a parent record.

Returns:

  • (Array<String>)

    The names of the field(s) that distinguish a child of a parent record.



20
21
22
# File 'lib/csv-diff/csv_source.rb', line 20

def child_fields
  @child_fields
end

#field_namesArray<String> (readonly)

Returns The names of the fields in the source file.

Returns:

  • (Array<String>)

    The names of the fields in the source file



11
12
13
# File 'lib/csv-diff/csv_source.rb', line 11

def field_names
  @field_names
end

#indexHash<String,Array<String>> (readonly)

Returns A hash containing each parent key, and an Array of the child keys it is a parent of.

Returns:

  • (Hash<String,Array<String>>)

    A hash containing each parent key, and an Array of the child keys it is a parent of.



45
46
47
# File 'lib/csv-diff/csv_source.rb', line 45

def index
  @index
end

#key_field_indexesArray<Fixnum> (readonly)

Returns The indexes of the key fields in the source file.

Returns:

  • (Array<Fixnum>)

    The indexes of the key fields in the source file.



24
25
26
# File 'lib/csv-diff/csv_source.rb', line 24

def key_field_indexes
  @key_field_indexes
end

#key_fieldsArray<String> (readonly)

Returns The names of the field(s) that uniquely identify each row.

Returns:

  • (Array<String>)

    The names of the field(s) that uniquely identify each row.



14
15
16
# File 'lib/csv-diff/csv_source.rb', line 14

def key_fields
  @key_fields
end

#line_countFixnum (readonly)

Returns A count of the lines processed from this source. Excludes any header and duplicate records identified during indexing.

Returns:

  • (Fixnum)

    A count of the lines processed from this source. Excludes any header and duplicate records identified during indexing.



51
52
53
# File 'lib/csv-diff/csv_source.rb', line 51

def line_count
  @line_count
end

#linesHash<String,Hash> (readonly)

Returns A hash containing each line of the source, keyed on the values of the key_fields.

Returns:

  • (Hash<String,Hash>)

    A hash containing each line of the source, keyed on the values of the key_fields.



42
43
44
# File 'lib/csv-diff/csv_source.rb', line 42

def lines
  @lines
end

#parent_field_indexesArray<Fixnum> (readonly)

Returns The indexes of the parent fields in the source file.

Returns:

  • (Array<Fixnum>)

    The indexes of the parent fields in the source file.



27
28
29
# File 'lib/csv-diff/csv_source.rb', line 27

def parent_field_indexes
  @parent_field_indexes
end

#parent_fieldsArray<String> (readonly)

Returns The names of the field(s) that identify a common parent of child records.

Returns:

  • (Array<String>)

    The names of the field(s) that identify a common parent of child records.



17
18
19
# File 'lib/csv-diff/csv_source.rb', line 17

def parent_fields
  @parent_fields
end

#pathString

Returns the path to the source file.

Returns:

  • (String)

    the path to the source file



8
9
10
# File 'lib/csv-diff/csv_source.rb', line 8

def path
  @path
end

#skip_countFixnum (readonly)

Returns A count of the lines from this source that were skipped, due either to duplicate keys or filter conditions.

Returns:

  • (Fixnum)

    A count of the lines from this source that were skipped, due either to duplicate keys or filter conditions.



54
55
56
# File 'lib/csv-diff/csv_source.rb', line 54

def skip_count
  @skip_count
end

#trim_whitespaceBoolean (readonly)

Returns True if leading/trailing whitespace should be stripped from fields.

Returns:

  • (Boolean)

    True if leading/trailing whitespace should be stripped from fields



39
40
41
# File 'lib/csv-diff/csv_source.rb', line 39

def trim_whitespace
  @trim_whitespace
end

#warningsArray<String> (readonly)

Returns An array of any warnings encountered while processing the source.

Returns:

  • (Array<String>)

    An array of any warnings encountered while processing the source.



48
49
50
# File 'lib/csv-diff/csv_source.rb', line 48

def warnings
  @warnings
end

Instance Method Details

#[](key) ⇒ Hash

Returns the row in the CSV source corresponding to the supplied key.

Parameters:

  • key (String)

    The unique key to use to lookup the row.

Returns:

  • (Hash)

    The fields for the line corresponding to key, or nil if the key is not recognised.



133
134
135
# File 'lib/csv-diff/csv_source.rb', line 133

def [](key)
    @lines[key]
end