Class: Moab::FileGroupDifference
- Inherits:
-
Serializer::Serializable
- Object
- Serializer::Serializable
- Moab::FileGroupDifference
- Includes:
- HappyMapper
- Defined in:
- lib/moab/file_group_difference.rb
Overview
Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University. All rights reserved. See LICENSE for details.
Performs analysis and reports the differences between two matching FileGroup objects. The descending elements of the report hold a detailed breakdown of file-level differences, organized by change type. This stanza is a child element of FileInventoryDifference, the documentation of which contains a full example.
In order to determine the detailed nature of the differences that are present between the two manifests, this algorithm first compares the sets of file signatures present in the groups being compared, then uses the result of that operation for subsequent analysis of filename correspondences.
For the first step, a Ruby Hash is extracted from each of the of the two groups, with an array of FileSignature object used as hash keys, and the corresponding FileInstance arrays as the hash values. The set of keys from the basis hash can be compared against the keys from the other hash using Array operators:
-
matching = basis_array & other_array
-
basis_only = basis_array - other_array
-
other_only = other_array - basis_array
For the second step of the comparison, the matching and non-matching sets of hash entries are further categorized as follows:
-
identical = signature and file path is the same in both basis and other file group
-
renamed = signature is unchanged, but the path has moved
-
copyadded = duplicate copy of file was added
-
copydeleted = duplicate copy of file was deleted
-
modified = path is same in both groups, but the signature has changed
-
added = signature and path are only in the other inventor
-
deleted = signature and path are only in the basis inventory
Data Model
-
FileInventoryDifference = compares two FileInventory instances based on file signatures and pathnames
-
FileGroupDifference [1..*] = performs analysis and reports differences between two matching FileGroup objects
-
FileGroupDifferenceSubset [1..5] = collects a set of file-level differences of a give change type
-
FileInstanceDifference [1..*] = contains difference information at the file level
-
FileSignature [1..2] = contains the file signature(s) of two file instances being compared
-
-
-
-
Instance Attribute Summary collapse
-
#added ⇒ Integer
How many files were added.
-
#copyadded ⇒ Integer
How many duplicate copies of files were added.
-
#copydeleted ⇒ Integer
How many duplicate copies of files were deleted.
-
#deleted ⇒ Integer
How many files were deleted.
-
#difference_count ⇒ Integer
The total number of differences found between the two inventories that were compared (dynamically calculated).
-
#group_id ⇒ String
The name of the file group.
-
#identical ⇒ Integer
How many files were unchanged.
-
#modified ⇒ Integer
How many files were modified.
-
#renamed ⇒ Integer
How many files were renamed.
-
#subset_hash ⇒ Hash<Symbol,FileGroupDifferenceSubset>
A set of containers (one for each change type), each of which contains a collection of file-level differences having that change type.
-
#subsets ⇒ Array<FileGroupDifferenceSubset>
A set of Arrays (one for each change type), each of which contains an collection of file-level differences having that change type.
Instance Method Summary collapse
-
#basis_only_keys(basis_hash, other_hash) ⇒ Array
Compare the keys of two hashes and return the keys unique to the first hash.
-
#compare_file_groups(basis_group, other_group) ⇒ FileGroupDifference
Compare two file groups and return a differences report.
-
#compare_matching_signatures(basis_group, other_group) ⇒ FileGroupDifference
For signatures that are present in both groups, report which file instances are identical or renamed.
-
#compare_non_matching_signatures(basis_group, other_group) ⇒ FileGroupDifference
For signatures that are present in only one or the other group, report which file instances are modified, deleted, or added.
-
#file_deltas ⇒ Hash<Symbol,Array>
Sets of filenames grouped by change type for use in performing file or metadata operations.
-
#initialize(opts = {}) ⇒ FileGroupDifference
constructor
A new instance of FileGroupDifference.
-
#matching_keys(basis_hash, other_hash) ⇒ Array
Compare the keys of two hashes and return the intersection.
-
#other_only_keys(basis_hash, other_hash) ⇒ Array
Compare the keys of two hashes and return the keys unique to the second hash.
-
#rename_require_temp_files(filepairs) ⇒ Boolean
Test whether any of the new names are the same as one of the old names, such as would be true for insertion of a new file into a page sequence, or a circular rename.
-
#rename_tempfile_triplets(filepairs) ⇒ Array<Array<String>>
A set of file triples containing oldname, tempname, newname.
-
#subset(change) ⇒ FileGroupDifferenceSubset
Find a specified subset of changes.
-
#summary ⇒ FileGroupDifference
Clone just this element for inclusion in a versionMetadata structure.
-
#summary_fields ⇒ Array<String>
The data fields to include in summary reports.
-
#tabulate_added_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference
Container for reporting the set of file-level differences of type ‘added’.
-
#tabulate_deleted_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference
Container for reporting the set of file-level differences of type ‘deleted’.
-
#tabulate_modified_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference
Container for reporting the set of file-level differences of type ‘modified’.
-
#tabulate_renamed_files(matching_signatures, basis_signature_hash, other_signature_hash) ⇒ FileGroupDifference
Container for reporting the set of file-level differences of type ‘renamed’,‘copyadded’, or ‘copydeleted’.
-
#tabulate_unchanged_files(matching_signatures, basis_signature_hash, other_signature_hash) ⇒ FileGroupDifference
Container for reporting the set of file-level differences of type ‘identical’.
Methods inherited from Serializer::Serializable
#array_to_hash, deep_diff, #diff, #key, #key_name, #to_hash, #to_json, #to_yaml, #variable_names, #variables
Constructor Details
#initialize(opts = {}) ⇒ FileGroupDifference
Returns a new instance of FileGroupDifference.
53 54 55 56 |
# File 'lib/moab/file_group_difference.rb', line 53 def initialize(opts = {}) @subset_hash = Hash.new { |hash, key| hash[key] = FileGroupDifferenceSubset.new(:change => key.to_s) } super(opts) end |
Instance Attribute Details
#added ⇒ Integer
Returns How many files were added.
112 |
# File 'lib/moab/file_group_difference.rb', line 112 attribute :added, Integer, :on_save => proc { |n| n.to_s } |
#copyadded ⇒ Integer
Returns How many duplicate copies of files were added.
84 |
# File 'lib/moab/file_group_difference.rb', line 84 attribute :copyadded, Integer, :on_save => proc { |n| n.to_s } |
#copydeleted ⇒ Integer
Returns How many duplicate copies of files were deleted.
91 |
# File 'lib/moab/file_group_difference.rb', line 91 attribute :copydeleted, Integer, :on_save => proc { |n| n.to_s } |
#deleted ⇒ Integer
Returns How many files were deleted.
119 |
# File 'lib/moab/file_group_difference.rb', line 119 attribute :deleted, Integer, :on_save => proc { |n| n.to_s } |
#difference_count ⇒ Integer
Returns the total number of differences found between the two inventories that were compared (dynamically calculated).
65 |
# File 'lib/moab/file_group_difference.rb', line 65 attribute :difference_count, Integer, :tag => 'differenceCount', :on_save => proc { |i| i.to_s } |
#group_id ⇒ String
Returns The name of the file group.
60 |
# File 'lib/moab/file_group_difference.rb', line 60 attribute :group_id, String, :tag => 'groupId', :key => true |
#identical ⇒ Integer
Returns How many files were unchanged.
77 |
# File 'lib/moab/file_group_difference.rb', line 77 attribute :identical, Integer, :on_save => proc { |n| n.to_s } |
#modified ⇒ Integer
Returns How many files were modified.
105 |
# File 'lib/moab/file_group_difference.rb', line 105 attribute :modified, Integer, :on_save => proc { |n| n.to_s } |
#renamed ⇒ Integer
Returns How many files were renamed.
98 |
# File 'lib/moab/file_group_difference.rb', line 98 attribute :renamed, Integer, :on_save => proc { |n| n.to_s } |
#subset_hash ⇒ Hash<Symbol,FileGroupDifferenceSubset>
Returns A set of containers (one for each change type), each of which contains a collection of file-level differences having that change type.
44 45 46 |
# File 'lib/moab/file_group_difference.rb', line 44 def subset_hash @subset_hash end |
#subsets ⇒ Array<FileGroupDifferenceSubset>
Returns A set of Arrays (one for each change type), each of which contains an collection of file-level differences having that change type.
127 |
# File 'lib/moab/file_group_difference.rb', line 127 has_many :subsets, FileGroupDifferenceSubset, :tag => 'subset' |
Instance Method Details
#basis_only_keys(basis_hash, other_hash) ⇒ Array
Returns Compare the keys of two hashes and return the keys unique to the first hash.
169 170 171 |
# File 'lib/moab/file_group_difference.rb', line 169 def basis_only_keys(basis_hash, other_hash) basis_hash.keys - other_hash.keys end |
#compare_file_groups(basis_group, other_group) ⇒ FileGroupDifference
Returns Compare two file groups and return a differences report.
184 185 186 187 188 189 |
# File 'lib/moab/file_group_difference.rb', line 184 def compare_file_groups(basis_group, other_group) @group_id = basis_group.group_id compare_matching_signatures(basis_group, other_group) compare_non_matching_signatures(basis_group, other_group) self end |
#compare_matching_signatures(basis_group, other_group) ⇒ FileGroupDifference
Returns For signatures that are present in both groups, report which file instances are identical or renamed.
195 196 197 198 199 200 |
# File 'lib/moab/file_group_difference.rb', line 195 def compare_matching_signatures(basis_group, other_group) matching_signatures = matching_keys(basis_group.signature_hash, other_group.signature_hash) tabulate_unchanged_files(matching_signatures, basis_group.signature_hash, other_group.signature_hash) tabulate_renamed_files(matching_signatures, basis_group.signature_hash, other_group.signature_hash) self end |
#compare_non_matching_signatures(basis_group, other_group) ⇒ FileGroupDifference
Returns For signatures that are present in only one or the other group, report which file instances are modified, deleted, or added.
206 207 208 209 210 211 212 213 214 215 |
# File 'lib/moab/file_group_difference.rb', line 206 def compare_non_matching_signatures(basis_group, other_group) basis_only_signatures = basis_only_keys(basis_group.signature_hash, other_group.signature_hash) other_only_signatures = other_only_keys(basis_group.signature_hash, other_group.signature_hash) basis_path_hash = basis_group.path_hash_subset(basis_only_signatures) other_path_hash = other_group.path_hash_subset(other_only_signatures) tabulate_modified_files(basis_path_hash, other_path_hash) tabulate_added_files(basis_path_hash, other_path_hash) tabulate_deleted_files(basis_path_hash, other_path_hash) self end |
#file_deltas ⇒ Hash<Symbol,Array>
Returns Sets of filenames grouped by change type for use in performing file or metadata operations.
331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 |
# File 'lib/moab/file_group_difference.rb', line 331 def file_deltas # The hash to be returned deltas = Hash.new { |hash, key| hash[key] = [] } # case where other_path is empty or 'same'. (create array of strings) %i[identical modified deleted copydeleted].each do |change| deltas[change].concat(subset_hash[change].files.collect(&:basis_path)) end # case where basis_path and other_path are both present. (create array of arrays) %i[copyadded renamed].each do |change| deltas[change].concat(subset_hash[change].files.collect { |file| [file.basis_path, file.other_path] }) end # case where basis_path is empty. (create array of strings) [:added].each do |change| deltas[change].concat(subset_hash[change].files.collect(&:other_path)) end deltas end |
#matching_keys(basis_hash, other_hash) ⇒ Array
Returns Compare the keys of two hashes and return the intersection.
162 163 164 |
# File 'lib/moab/file_group_difference.rb', line 162 def matching_keys(basis_hash, other_hash) basis_hash.keys & other_hash.keys end |
#other_only_keys(basis_hash, other_hash) ⇒ Array
Returns Compare the keys of two hashes and return the keys unique to the second hash.
176 177 178 |
# File 'lib/moab/file_group_difference.rb', line 176 def other_only_keys(basis_hash, other_hash) other_hash.keys - basis_hash.keys end |
#rename_require_temp_files(filepairs) ⇒ Boolean
Returns Test whether any of the new names are the same as one of the old names, such as would be true for insertion of a new file into a page sequence, or a circular rename. In such a case, return true, indicating that use of intermediate temporary files would be required when updating a copy of an object’s files at a given location.
354 355 356 357 358 359 360 361 362 363 364 365 |
# File 'lib/moab/file_group_difference.rb', line 354 def rename_require_temp_files(filepairs) # Split the filepairs into two arrays oldnames = [] newnames = [] filepairs.each do |old, new| oldnames << old newnames << new end # Are any of the filenames the same in set of oldnames and set of newnames? intersection = oldnames & newnames intersection.count > 0 end |
#rename_tempfile_triplets(filepairs) ⇒ Array<Array<String>>
Returns a set of file triples containing oldname, tempname, newname.
369 370 371 |
# File 'lib/moab/file_group_difference.rb', line 369 def rename_tempfile_triplets(filepairs) filepairs.collect { |old, new| [old, new, "#{new}-#{Time.now.strftime('%Y%m%d%H%H%S')}-tmp"] } end |
#subset(change) ⇒ FileGroupDifferenceSubset
Returns Find a specified subset of changes.
48 49 50 |
# File 'lib/moab/file_group_difference.rb', line 48 def subset(change) subset_hash[change.to_sym] end |
#summary ⇒ FileGroupDifference
Returns Clone just this element for inclusion in a versionMetadata structure.
145 146 147 148 149 150 151 152 153 154 155 156 |
# File 'lib/moab/file_group_difference.rb', line 145 def summary FileGroupDifference.new( :group_id => group_id, :identical => identical, :copyadded => copyadded, :copydeleted => copydeleted, :renamed => renamed, :modified => modified, :added => added, :deleted => deleted ) end |
#summary_fields ⇒ Array<String>
Returns The data fields to include in summary reports.
139 140 141 |
# File 'lib/moab/file_group_difference.rb', line 139 def summary_fields %w[group_id difference_count identical copyadded copydeleted renamed modified deleted added] end |
#tabulate_added_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference
Returns Container for reporting the set of file-level differences of type ‘added’.
301 302 303 304 305 306 307 308 309 310 |
# File 'lib/moab/file_group_difference.rb', line 301 def tabulate_added_files(basis_path_hash, other_path_hash) other_only_keys(basis_path_hash, other_path_hash).each do |path| fid = FileInstanceDifference.new(:change => 'added') fid.basis_path = "" fid.other_path = path fid.signatures << other_path_hash[path] subset_hash[:added].files << fid end self end |
#tabulate_deleted_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference
Returns Container for reporting the set of file-level differences of type ‘deleted’.
319 320 321 322 323 324 325 326 327 328 |
# File 'lib/moab/file_group_difference.rb', line 319 def tabulate_deleted_files(basis_path_hash, other_path_hash) basis_only_keys(basis_path_hash, other_path_hash).each do |path| fid = FileInstanceDifference.new(:change => 'deleted') fid.basis_path = path fid.other_path = "" fid.signatures << basis_path_hash[path] subset_hash[:deleted].files << fid end self end |
#tabulate_modified_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference
Returns Container for reporting the set of file-level differences of type ‘modified’.
282 283 284 285 286 287 288 289 290 291 292 |
# File 'lib/moab/file_group_difference.rb', line 282 def tabulate_modified_files(basis_path_hash, other_path_hash) matching_keys(basis_path_hash, other_path_hash).each do |path| fid = FileInstanceDifference.new(:change => 'modified') fid.basis_path = path fid.other_path = "same" fid.signatures << basis_path_hash[path] fid.signatures << other_path_hash[path] subset_hash[:modified].files << fid end self end |
#tabulate_renamed_files(matching_signatures, basis_signature_hash, other_signature_hash) ⇒ FileGroupDifference
Returns Container for reporting the set of file-level differences of type ‘renamed’,‘copyadded’, or ‘copydeleted’.
249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 |
# File 'lib/moab/file_group_difference.rb', line 249 def tabulate_renamed_files(matching_signatures, basis_signature_hash, other_signature_hash) matching_signatures.each do |signature| basis_paths = basis_signature_hash[signature].paths other_paths = other_signature_hash[signature].paths basis_only_paths = basis_paths - other_paths other_only_paths = other_paths - basis_paths maxsize = [basis_only_paths.size, other_only_paths.size].max (0..maxsize - 1).each do |n| fid = FileInstanceDifference.new fid.basis_path = basis_only_paths[n] fid.other_path = other_only_paths[n] fid.signatures << signature if fid.basis_path.nil? fid.change = 'copyadded' fid.basis_path = basis_paths[0] elsif fid.other_path.nil? fid.change = 'copydeleted' else fid.change = 'renamed' end subset_hash[fid.change.to_sym].files << fid end end self end |
#tabulate_unchanged_files(matching_signatures, basis_signature_hash, other_signature_hash) ⇒ FileGroupDifference
Returns Container for reporting the set of file-level differences of type ‘identical’.
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 |
# File 'lib/moab/file_group_difference.rb', line 225 def tabulate_unchanged_files(matching_signatures, basis_signature_hash, other_signature_hash) matching_signatures.each do |signature| basis_paths = basis_signature_hash[signature].paths other_paths = other_signature_hash[signature].paths matching_paths = basis_paths & other_paths matching_paths.each do |path| fid = FileInstanceDifference.new(:change => 'identical') fid.basis_path = path fid.other_path = "same" fid.signatures << signature subset_hash[:identical].files << fid end end self end |