Class: CSVDiff::Source
- Inherits:
-
Object
- Object
- CSVDiff::Source
- Defined in:
- lib/csv-diff/source.rb
Overview
Reppresents an input (i.e the left/from or tight/to input) to the diff process.
Instance Attribute Summary collapse
-
#case_sensitive ⇒ Boolean
(also: #case_sensitive?)
readonly
True if the source has been indexed with case- sensitive keys, or false if it has been indexed using upper-case key values.
-
#child_field_indexes ⇒ Array<Fixnum>
readonly
The indexes of the child fields in the source file.
-
#child_fields ⇒ Array<String>
readonly
The names of the field(s) that distinguish a child of a parent record.
-
#data ⇒ Array<Arrary>
readonly
The data for this source.
-
#dup_count ⇒ Fixnum
readonly
A count of the lines from this source that had the same key value as another line.
-
#field_names ⇒ Array<String>
readonly
The names of the fields in the source file.
-
#index ⇒ Hash<String,Array<String>>
readonly
A hash containing each parent key, and an Array of the child keys it is a parent of.
-
#key_field_indexes ⇒ Array<Fixnum>
readonly
The indexes of the key fields in the source file.
-
#key_fields ⇒ Array<String>
readonly
The names of the field(s) that uniquely identify each row.
-
#line_count ⇒ Fixnum
readonly
A count of the lines processed from this source.
-
#lines ⇒ Hash<String,Hash>
readonly
A hash containing each line of the source, keyed on the values of the
key_fields
. -
#parent_field_indexes ⇒ Array<Fixnum>
readonly
The indexes of the parent fields in the source file.
-
#parent_fields ⇒ Array<String>
readonly
The names of the field(s) that identify a common parent of child records.
-
#path ⇒ String
The path to the source file.
-
#skip_count ⇒ Fixnum
readonly
A count of the lines from this source that were skipped due to filter conditions.
-
#trim_whitespace ⇒ Boolean
readonly
True if leading/trailing whitespace should be stripped from fields.
-
#warnings ⇒ Array<String>
readonly
An array of any warnings encountered while processing the source.
Instance Method Summary collapse
-
#[](key) ⇒ Hash
Returns the row in the CSV source corresponding to the supplied key.
-
#index_source ⇒ Object
Given an array of lines, where each line is an array of fields, indexes the array contents so that it can be looked up by key.
-
#initialize(options = {}) ⇒ Source
constructor
Creates a new diff source.
- #path? ⇒ Boolean
-
#save_csv(file_path, options = {}) ⇒ Object
Save the data in this Source as a CSV at
file_path
. -
#to_hash ⇒ Object
Convert the data in this source to Array<Hash> using the field names as keys for the Hash in each row.
Constructor Details
#initialize(options = {}) ⇒ Source
Creates a new diff source.
A diff source must contain at least one field that will be used as the key to identify the same record in a different version of this file. If not specified via one of the options, the first field is assumed to be the unique key.
If multiple fields combine to form a unique key, the combined fields are considered as a single unique identifier. If your key represents data that can be represented as a tree, you can instead break your key fields into :parent_fields and :child_fields. By doing this, if a child key is deleted from one parent, and added to another, that will be reported as an update, with a change to the parent key part(s) of the record.
All key options can be specified either by field name, or by field index (0 based).
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
# File 'lib/csv-diff/source.rb', line 102 def initialize( = {}) if (.keys & [:parent_field, :parent_fields, :child_field, :child_fields]).empty? && (kf = .fetch(:key_field, [:key_fields])) @key_fields = [kf].flatten @parent_fields = [] @child_fields = @key_fields else @parent_fields = [.fetch(:parent_field, [:parent_fields]) || []].flatten @child_fields = [.fetch(:child_field, [:child_fields]) || [0]].flatten @key_fields = @parent_fields + @child_fields end @field_names = [:field_names] @case_sensitive = .fetch(:case_sensitive, true) @trim_whitespace = .fetch(:trim_whitespace, false) @ignore_header = [:ignore_header] @include = [:include] @exclude = [:exclude] @path = .fetch(:path, 'NA') unless @path @warnings = [] end |
Instance Attribute Details
#case_sensitive ⇒ Boolean (readonly) Also known as: case_sensitive?
Returns True if the source has been indexed with case- sensitive keys, or false if it has been indexed using upper-case key values.
37 38 39 |
# File 'lib/csv-diff/source.rb', line 37 def case_sensitive @case_sensitive end |
#child_field_indexes ⇒ Array<Fixnum> (readonly)
Returns The indexes of the child fields in the source file.
32 33 34 |
# File 'lib/csv-diff/source.rb', line 32 def child_field_indexes @child_field_indexes end |
#child_fields ⇒ Array<String> (readonly)
Returns The names of the field(s) that distinguish a child of a parent record.
22 23 24 |
# File 'lib/csv-diff/source.rb', line 22 def child_fields @child_fields end |
#data ⇒ Array<Arrary> (readonly)
Returns The data for this source.
10 11 12 |
# File 'lib/csv-diff/source.rb', line 10 def data @data end |
#dup_count ⇒ Fixnum (readonly)
Returns A count of the lines from this source that had the same key value as another line.
59 60 61 |
# File 'lib/csv-diff/source.rb', line 59 def dup_count @dup_count end |
#field_names ⇒ Array<String> (readonly)
Returns The names of the fields in the source file.
13 14 15 |
# File 'lib/csv-diff/source.rb', line 13 def field_names @field_names end |
#index ⇒ Hash<String,Array<String>> (readonly)
Returns A hash containing each parent key, and an Array of the child keys it is a parent of.
47 48 49 |
# File 'lib/csv-diff/source.rb', line 47 def index @index end |
#key_field_indexes ⇒ Array<Fixnum> (readonly)
Returns The indexes of the key fields in the source file.
26 27 28 |
# File 'lib/csv-diff/source.rb', line 26 def key_field_indexes @key_field_indexes end |
#key_fields ⇒ Array<String> (readonly)
Returns The names of the field(s) that uniquely identify each row.
16 17 18 |
# File 'lib/csv-diff/source.rb', line 16 def key_fields @key_fields end |
#line_count ⇒ Fixnum (readonly)
Returns A count of the lines processed from this source. Excludes any header and duplicate records identified during indexing.
53 54 55 |
# File 'lib/csv-diff/source.rb', line 53 def line_count @line_count end |
#lines ⇒ Hash<String,Hash> (readonly)
Returns A hash containing each line of the source, keyed on the values of the key_fields
.
44 45 46 |
# File 'lib/csv-diff/source.rb', line 44 def lines @lines end |
#parent_field_indexes ⇒ Array<Fixnum> (readonly)
Returns The indexes of the parent fields in the source file.
29 30 31 |
# File 'lib/csv-diff/source.rb', line 29 def parent_field_indexes @parent_field_indexes end |
#parent_fields ⇒ Array<String> (readonly)
Returns The names of the field(s) that identify a common parent of child records.
19 20 21 |
# File 'lib/csv-diff/source.rb', line 19 def parent_fields @parent_fields end |
#path ⇒ String
Returns the path to the source file.
8 9 10 |
# File 'lib/csv-diff/source.rb', line 8 def path @path end |
#skip_count ⇒ Fixnum (readonly)
Returns A count of the lines from this source that were skipped due to filter conditions.
56 57 58 |
# File 'lib/csv-diff/source.rb', line 56 def skip_count @skip_count end |
#trim_whitespace ⇒ Boolean (readonly)
Returns True if leading/trailing whitespace should be stripped from fields.
41 42 43 |
# File 'lib/csv-diff/source.rb', line 41 def trim_whitespace @trim_whitespace end |
#warnings ⇒ Array<String> (readonly)
Returns An array of any warnings encountered while processing the source.
50 51 52 |
# File 'lib/csv-diff/source.rb', line 50 def warnings @warnings end |
Instance Method Details
#[](key) ⇒ Hash
Returns the row in the CSV source corresponding to the supplied key.
134 135 136 |
# File 'lib/csv-diff/source.rb', line 134 def [](key) @lines[key] end |
#index_source ⇒ Object
Given an array of lines, where each line is an array of fields, indexes the array contents so that it can be looked up by key.
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 |
# File 'lib/csv-diff/source.rb', line 141 def index_source @lines = {} @index = Hash.new{ |h, k| h[k] = [] } if @field_names index_fields include_filter = convert_filter(@include, @field_names) exclude_filter = convert_filter(@exclude, @field_names) end @line_count = 0 @skip_count = 0 @dup_count = 0 line_num = 0 @data.each do |row| line_num += 1 next if line_num == 1 && @field_names && @ignore_header unless @field_names if row.class.name == 'CSV::Row' @field_names = row.headers.each_with_index.map{ |f, i| f || i.to_s } else @field_names = row.each_with_index.map{ |f, i| f || i.to_s } end index_fields include_filter = convert_filter(@include, @field_names) exclude_filter = convert_filter(@exclude, @field_names) next end field_vals = row line = {} filter = false @field_names.each_with_index do |field, i| val = field_vals[i] val = val.to_s.strip if val && @trim_whitespace line[field] = val if include_filter && f = include_filter[i] filter = !check_filter(f, line[field]) end if exclude_filter && f = exclude_filter[i] filter = check_filter(f, line[field]) end break if filter end if filter @skip_count += 1 next end key_values = @key_field_indexes.map{ |kf| @case_sensitive ? field_vals[kf].to_s : field_vals[kf].to_s.upcase } key = key_values.join('~') parent_key = key_values[0...(@parent_fields.length)].join('~') if @lines[key] @warnings << "Duplicate key '#{key}' encountered at line #{line_num}" @dup_count += 1 key += "[#{@dup_count}]" end @index[parent_key] << key @lines[key] = line @line_count += 1 end end |
#path? ⇒ Boolean
124 125 126 |
# File 'lib/csv-diff/source.rb', line 124 def path? @path != 'NA' end |
#save_csv(file_path, options = {}) ⇒ Object
Save the data in this Source as a CSV at file_path
.
208 209 210 211 212 213 214 215 216 |
# File 'lib/csv-diff/source.rb', line 208 def save_csv(file_path, = {}) require 'csv' default_opts = { headers: @field_name, write_headers: true } CSV.open(file_path, 'wb', default_opts.merge()) do |csv| @data.each{ |rec| csv << rec } end end |
#to_hash ⇒ Object
Convert the data in this source to Array<Hash> using the field names as keys for the Hash in each row.
221 222 223 224 225 226 227 |
# File 'lib/csv-diff/source.rb', line 221 def to_hash @data.map do |row| hsh = {} @field_names.each_with_index.map{ |fld, i| hsh[fld] = row[i] } hsh end end |