Class: CSVDiff

Inherits:
Object
  • Object
show all
Includes:
Algorithm
Defined in:
lib/csv-diff/csv_diff.rb,
lib/csv-diff/algorithm.rb,
lib/csv-diff/csv_source.rb

Overview

This library performs diffs of flat file content that contains structured data in fields, with rows provided in a parent-child format.

Parent-child data does not lend itself well to standard text diffs, as small changes in the organisation of the tree at an upper level (e.g. re-ordering of two ancestor nodes) can lead to big movements in the position of descendant records - particularly when the parent-child data is generated by a hierarchy traversal.

Additionally, simple line-based diffs can identify that a line has changed, but not which field(s) in the line have changed.

Data may be supplied in the form of CSV files, or as an array of arrays. The diff process process provides a fine level of control over what to diff, and can optionally ignore certain types of changes (e.g. changes in order).

Defined Under Namespace

Modules: Algorithm Classes: CSVSource

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Algorithm

#diff_row, #diff_sources

Constructor Details

#initialize(left, right, options = {}) ⇒ CSVDiff

Generates a diff between two hierarchical tree structures, provided as left and right, each of which consists of an array of lines in CSV format. An array of field indexes can also be specified as key_fields; a minimum of one field index must be specified; the last index is the child id, and the remaining fields (if any) are the parent field(s) that uniquely qualify the child instance.

Options Hash (options):

  • :encoding (String)

    The encoding to use when opening the CSV files.

  • :field_names (Array<String>)

    An Array of field names for each field in left and right. If not provided, the first row is assumed to contain field names.

  • :ignore_header (Boolean)

    If true, the first line of each file is ignored. This option can only be true if :field_names is specified.

  • :key_field (String)

    The name of the field that uniquely identifies each row.

  • :key_fields (Array<String>)

    The names of the fields that uniquely identifies each row.

  • :parent_field (String)

    The name of the field that identifies a parent within which sibling order should be checked.

  • :child_field (String)

    The name of the field that uniquely identifies a child of a parent.

  • :ignore_adds (Boolean)

    If true, records that appear in the right/to file but not in the left/from file are not reported.

  • :ignore_updates (Boolean)

    If true, records that have been updated are not reported.

  • :ignore_moves (Boolean)

    If true, changes in row position amongst sibling rows are not reported.

  • :ignore_deletes (Boolean)

    If true, records that appear in the left/from file but not in the right/to file are not reported.



83
84
85
86
87
88
89
90
91
92
# File 'lib/csv-diff/csv_diff.rb', line 83

def initialize(left, right, options = {})
    @left = left.is_a?(CSVSource) ? left : CSVSource.new(left, options)
    raise "No field names found in left (from) source" unless @left.field_names && @left.field_names.size > 0
    @right = right.is_a?(CSVSource) ? right : CSVSource.new(right, options)
    raise "No field names found in right (to) source" unless @right.field_names && @right.field_names.size > 0
    @warnings = []
    @diff_fields = get_diff_fields(@left.field_names, @right.field_names, options)
    @key_fields = @left.key_fields
    diff(options)
end

Instance Attribute Details

#child_fieldsArray<String> (readonly)



37
38
39
# File 'lib/csv-diff/csv_diff.rb', line 37

def child_fields
  @child_fields
end

#diff_fieldsArray<String> (readonly)



30
31
32
# File 'lib/csv-diff/csv_diff.rb', line 30

def diff_fields
  @diff_fields
end

#diffsArray<Hash> (readonly)



27
28
29
# File 'lib/csv-diff/csv_diff.rb', line 27

def diffs
  @diffs
end

#key_fieldsArray<String> (readonly)



33
34
35
# File 'lib/csv-diff/csv_diff.rb', line 33

def key_fields
  @key_fields
end

#leftCSVSource (readonly) Also known as: from



20
21
22
# File 'lib/csv-diff/csv_diff.rb', line 20

def left
  @left
end

#optionsHash (readonly)



39
40
41
# File 'lib/csv-diff/csv_diff.rb', line 39

def options
  @options
end

#parent_fieldsArray<String> (readonly)



35
36
37
# File 'lib/csv-diff/csv_diff.rb', line 35

def parent_fields
  @parent_fields
end

#rightCSVSource (readonly) Also known as: to



24
25
26
# File 'lib/csv-diff/csv_diff.rb', line 24

def right
  @right
end

Instance Method Details

#diff(options = {}) ⇒ Object

Performs a diff with the specified options.



96
97
98
99
100
# File 'lib/csv-diff/csv_diff.rb', line 96

def diff(options = {})
    @summary = nil
    @options = options
    @diffs = diff_sources(@left, @right, @key_fields, @diff_fields, options)
end

#diff_warningsArray<String>



130
131
132
# File 'lib/csv-diff/csv_diff.rb', line 130

def diff_warnings
    @warnings
end

#summaryObject

Returns a summary of the number of adds, deletes, moves, and updates.



104
105
106
107
108
109
110
111
# File 'lib/csv-diff/csv_diff.rb', line 104

def summary
    unless @summary
        @summary = Hash.new{ |h, k| h[k] = 0 }
        @diffs.each{ |k, v| @summary[v[:action]] += 1 }
        @summary['Warning'] = warnings.size if warnings.size > 0
    end
    @summary
end

#warningsArray<String>



124
125
126
# File 'lib/csv-diff/csv_diff.rb', line 124

def warnings
    @left.warnings + @right.warnings + @warnings
end