Class: CSVDiff::CSVSource

Inherits:
Object
  • Object
show all
Defined in:
lib/csv-diff/csv_source.rb

Overview

Represents a CSV input (i.e. the left/from or right/to input) to the diff process.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(source, options = {}) ⇒ CSVSource

Creates a new diff source.

A diff source must contain at least one field that will be used as the key to identify the same record in a different version of this file. If not specified via one of the options, the first field is assumed to be the unique key.

If multiple fields combine to form a unique key, the parent is assumed to be identified by all but the last field of the unique key. If finer control is required, use a combination of the :parent_fields and :child_fields options.

All key options can be specified either by field name, or by field index (0 based).

Parameters:

  • source (String|Array<Array>)

    Either a path to a CSV file, or an Array of Arrays containing CSV data. If the :field_names option is not specified, the first line must contain the names of the fields.

  • options (Hash) (defaults to: {})

    An options hash.

Options Hash (options):

  • :encoding (String)

    The encoding to use when opening the CSV file.

  • :csv_options (Hash)

    Any options you wish to pass to CSV.open, e.g. :col_sep.

  • :field_names (Array<String>)

    The names of each of the fields in source.

  • :ignore_header (Boolean)

    If true, and :field_names has been specified, then the first row of the file is ignored.

  • :key_field (String)

    The name of the field that uniquely identifies each row.

  • :key_fields (Array<String>)

    The names of the fields that uniquely identifies each row.

  • :parent_field (String)

    The name of the field(s) that identify a parent within which sibling order should be checked.

  • :child_field (String)

    The name of the field(s) that uniquely identify a child of a parent.

  • :case_sensitive (Boolean)

    If true (the default), keys are indexed as-is; if false, the index is built in upper-case for case-insensitive comparisons.



77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# File 'lib/csv-diff/csv_source.rb', line 77

def initialize(source, options = {})
    if source.is_a?(String)
        require 'csv'
        mode_string = options[:encoding] ? "r:#{options[:encoding]}" : 'r'
        csv_options = options.fetch(:csv_options, {})
        @path = source
        source = CSV.open(@path, mode_string, csv_options).readlines
    end
    if kf = options.fetch(:key_field, options[:key_fields])
        @key_fields = [kf].flatten
        @parent_fields = @key_fields[0...-1]
        @child_fields = @key_fields[-1..-1]
    else
        @parent_fields = [options.fetch(:parent_field, options[:parent_fields]) || []].flatten
        @child_fields = [options.fetch(:child_field, options[:child_fields]) || [0]].flatten
        @key_fields = @parent_fields + @child_fields
    end
    @field_names = options[:field_names]
    @warnings = []
    index_source(source, options)
end

Instance Attribute Details

#case_sensitiveBoolean (readonly) Also known as: case_sensitive?

Returns True if the source has been indexed with case- sensitive keys, or false if it has been indexed using upper-case key values.

Returns:

  • (Boolean)

    True if the source has been indexed with case- sensitive keys, or false if it has been indexed using upper-case key values.



23
24
25
# File 'lib/csv-diff/csv_source.rb', line 23

def case_sensitive
  @case_sensitive
end

#child_fieldsArray<String> (readonly)

Returns The names of the field(s) that distinguish a child of a parent record.

Returns:

  • (Array<String>)

    The names of the field(s) that distinguish a child of a parent record.



19
20
21
# File 'lib/csv-diff/csv_source.rb', line 19

def child_fields
  @child_fields
end

#field_namesArray<String> (readonly)

Returns The names of the fields in the source file.

Returns:

  • (Array<String>)

    The names of the fields in the source file



10
11
12
# File 'lib/csv-diff/csv_source.rb', line 10

def field_names
  @field_names
end

#indexHash<String,Array<String>> (readonly)

Returns A hash containing each parent key, and an Array of the child keys it is a parent of.

Returns:

  • (Hash<String,Array<String>>)

    A hash containing each parent key, and an Array of the child keys it is a parent of.



33
34
35
# File 'lib/csv-diff/csv_source.rb', line 33

def index
  @index
end

#key_fieldsArray<String> (readonly)

Returns The names of the field(s) that uniquely identify each row.

Returns:

  • (Array<String>)

    The names of the field(s) that uniquely identify each row.



13
14
15
# File 'lib/csv-diff/csv_source.rb', line 13

def key_fields
  @key_fields
end

#linesHash<String,Hash> (readonly)

Returns A hash containing each line of the source, keyed on the values of the key_fields.

Returns:

  • (Hash<String,Hash>)

    A hash containing each line of the source, keyed on the values of the key_fields.



30
31
32
# File 'lib/csv-diff/csv_source.rb', line 30

def lines
  @lines
end

#parent_fieldsArray<String> (readonly)

Returns The names of the field(s) that identify a common parent of child records.

Returns:

  • (Array<String>)

    The names of the field(s) that identify a common parent of child records.



16
17
18
# File 'lib/csv-diff/csv_source.rb', line 16

def parent_fields
  @parent_fields
end

#pathString

Returns the path to the source file.

Returns:

  • (String)

    the path to the source file



8
9
10
# File 'lib/csv-diff/csv_source.rb', line 8

def path
  @path
end

#trim_whitespaceBoolean (readonly)

Returns True if leading/trailing whitespace should be stripped from fields.

Returns:

  • (Boolean)

    True if leading/trailing whitespace should be stripped from fields



27
28
29
# File 'lib/csv-diff/csv_source.rb', line 27

def trim_whitespace
  @trim_whitespace
end

#warningsArray<String> (readonly)

Returns An array of any warnings encountered while processing the source.

Returns:

  • (Array<String>)

    An array of any warnings encountered while processing the source.



36
37
38
# File 'lib/csv-diff/csv_source.rb', line 36

def warnings
  @warnings
end

Instance Method Details

#[](key) ⇒ Hash

Returns the row in the CSV source corresponding to the supplied key.

Parameters:

  • key (String)

    The unique key to use to lookup the row.

Returns:

  • (Hash)

    The fields for the line corresponding to key, or nil if the key is not recognised.



105
106
107
# File 'lib/csv-diff/csv_source.rb', line 105

def [](key)
    @lines[key]
end