Class: CSVDiff

Inherits:
Object
  • Object
show all
Includes:
Algorithm
Defined in:
lib/csv-diff/csv_diff.rb,
lib/csv-diff/source.rb,
lib/csv-diff/algorithm.rb,
lib/csv-diff/csv_source.rb,
lib/csv-diff/xml_source.rb

Overview

This library performs diffs of flat file content that contains structured data in fields, with rows provided in a parent-child format.

Parent-child data does not lend itself well to standard text diffs, as small changes in the organisation of the tree at an upper level (e.g. re-ordering of two ancestor nodes) can lead to big movements in the position of descendant records - particularly when the parent-child data is generated by a hierarchy traversal.

Additionally, simple line-based diffs can identify that a line has changed, but not which field(s) in the line have changed.

Data may be supplied in the form of CSV files, or as an array of arrays. The diff process process provides a fine level of control over what to diff, and can optionally ignore certain types of changes (e.g. changes in order).

Defined Under Namespace

Modules: Algorithm Classes: CSVSource, Source, XMLSource

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Algorithm

#diff_row, #diff_sources

Constructor Details

#initialize(left, right, options = {}) ⇒ CSVDiff

Generates a diff between two hierarchical tree structures, provided as left and right, each of which consists of an array of lines in CSV format. An array of field indexes can also be specified as key_fields; a minimum of one field index must be specified; the last index is the child id, and the remaining fields (if any) are the parent field(s) that uniquely qualify the child instance.

Parameters:

  • left (Array|String|CSVSource)

    An Array of lines, each of which is in an Array of fields, or a String specifying a path to a CSV file, or a CSVSource object.

  • right (Array|String|CSVSource)

    An Array of lines, each of which is an Array of fields, or a String specifying a path to a CSV file, or a CSVSource object.

  • options (Hash) (defaults to: {})

    A hash containing options.

Options Hash (options):

  • :encoding (String)

    The encoding to use when opening the CSV files.

  • :field_names (Array<String>)

    An Array of field names for each field in left and right. If not provided, the first row is assumed to contain field names.

  • :ignore_header (Boolean)

    If true, the first line of each file is ignored. This option can only be true if :field_names is specified.

  • :key_field (String)

    The name of the field that uniquely identifies each row.

  • :key_fields (Array<String>)

    The names of the fields that uniquely identifies each row.

  • :parent_field (String)

    The name of the field that identifies a parent within which sibling order should be checked.

  • :child_field (String)

    The name of the field that uniquely identifies a child of a parent.

  • :ignore_adds (Boolean)

    If true, records that appear in the right/to file but not in the left/from file are not reported.

  • :ignore_updates (Boolean)

    If true, records that have been updated are not reported.

  • :ignore_moves (Boolean)

    If true, changes in row position amongst sibling rows are not reported.

  • :ignore_deletes (Boolean)

    If true, records that appear in the left/from file but not in the right/to file are not reported.



83
84
85
86
87
88
89
90
91
92
93
94
# File 'lib/csv-diff/csv_diff.rb', line 83

def initialize(left, right, options = {})
    @left = left.is_a?(Source) ? left : CSVSource.new(left, options)
    @left.index_source if @left.lines.nil?
    raise "No field names found in left (from) source" unless @left.field_names && @left.field_names.size > 0
    @right = right.is_a?(Source) ? right : CSVSource.new(right, options)
    @right.index_source if @right.lines.nil?
    raise "No field names found in right (to) source" unless @right.field_names && @right.field_names.size > 0
    @warnings = []
    @diff_fields = get_diff_fields(@left.field_names, @right.field_names, options)
    @key_fields = @left.key_fields
    diff(options)
end

Instance Attribute Details

#child_fieldsArray<String> (readonly)

Returns An array of field names for the child field(s).

Returns:

  • (Array<String>)

    An array of field names for the child field(s).



37
38
39
# File 'lib/csv-diff/csv_diff.rb', line 37

def child_fields
  @child_fields
end

#diff_fieldsArray<String> (readonly)

Returns An array of field names that are compared in the diff process.

Returns:

  • (Array<String>)

    An array of field names that are compared in the diff process.



30
31
32
# File 'lib/csv-diff/csv_diff.rb', line 30

def diff_fields
  @diff_fields
end

#diffsArray<Hash> (readonly)

Returns An array of differences.

Returns:

  • (Array<Hash>)

    An array of differences



27
28
29
# File 'lib/csv-diff/csv_diff.rb', line 27

def diffs
  @diffs
end

#key_fieldsArray<String> (readonly)

Returns An array of field namees of the key fields that uniquely identify each row.

Returns:

  • (Array<String>)

    An array of field namees of the key fields that uniquely identify each row.



33
34
35
# File 'lib/csv-diff/csv_diff.rb', line 33

def key_fields
  @key_fields
end

#leftCSVSource (readonly) Also known as: from

Returns CSVSource object containing details of the left/from input.

Returns:

  • (CSVSource)

    CSVSource object containing details of the left/from input.



20
21
22
# File 'lib/csv-diff/csv_diff.rb', line 20

def left
  @left
end

#optionsHash (readonly)

Returns The options hash used for the diff.

Returns:

  • (Hash)

    The options hash used for the diff.



39
40
41
# File 'lib/csv-diff/csv_diff.rb', line 39

def options
  @options
end

#parent_fieldsArray<String> (readonly)

Returns An array of field names for the parent field(s).

Returns:

  • (Array<String>)

    An array of field names for the parent field(s).



35
36
37
# File 'lib/csv-diff/csv_diff.rb', line 35

def parent_fields
  @parent_fields
end

#rightCSVSource (readonly) Also known as: to

Returns CSVSource object containing details of the right/to input.

Returns:

  • (CSVSource)

    CSVSource object containing details of the right/to input.



24
25
26
# File 'lib/csv-diff/csv_diff.rb', line 24

def right
  @right
end

Instance Method Details

#diff(options = {}) ⇒ Object

Performs a diff with the specified options.



98
99
100
101
102
# File 'lib/csv-diff/csv_diff.rb', line 98

def diff(options = {})
    @summary = nil
    @options = options
    @diffs = diff_sources(@left, @right, @key_fields, @diff_fields, options)
end

#diff_warningsArray<String>

Returns an array of warning messages from the diff process.

Returns:

  • (Array<String>)

    an array of warning messages from the diff process.



132
133
134
# File 'lib/csv-diff/csv_diff.rb', line 132

def diff_warnings
    @warnings
end

#summaryObject

Returns a summary of the number of adds, deletes, moves, and updates.



106
107
108
109
110
111
112
113
# File 'lib/csv-diff/csv_diff.rb', line 106

def summary
    unless @summary
        @summary = Hash.new{ |h, k| h[k] = 0 }
        @diffs.each{ |k, v| @summary[v[:action]] += 1 }
        @summary['Warning'] = warnings.size if warnings.size > 0
    end
    @summary
end

#warningsArray<String>

Returns an array of warning messages generated from the sources and the diff process.

Returns:

  • (Array<String>)

    an array of warning messages generated from the sources and the diff process.



126
127
128
# File 'lib/csv-diff/csv_diff.rb', line 126

def warnings
    @left.warnings + @right.warnings + @warnings
end