Class: Moab::FileGroupDifference
- Inherits:
-
Serializer::Serializable
- Object
- Serializer::Serializable
- Moab::FileGroupDifference
- Includes:
- HappyMapper
- Defined in:
- lib/moab/file_group_difference.rb
Overview
Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University. All rights reserved. See LICENSE for details.
Performs analysis and reports the differences between two matching FileGroup objects. The descending elements of the report hold a detailed breakdown of file-level differences, organized by change type. This stanza is a child element of FileInventoryDifference, the documentation of which contains a full example.
In order to determine the detailed nature of the differences that are present between the two manifests, this algorithm first compares the sets of file signatures present in the groups being compared, then uses the result of that operation for subsequent analysis of filename correspondences.
For the first step, a Ruby Hash is extracted from each of the of the two groups, with an array of FileSignature object used as hash keys, and the corresponding FileInstance arrays as the hash values. The set of keys from the basis hash can be compared against the keys from the other hash using Array operators:
-
matching = basis_array & other_array
-
basis_only = basis_array - other_array
-
other_only = other_array - basis_array
For the second step of the comparison, the matching and non-matching sets of hash entries are further categorized as follows:
-
identical = signature and file path is the same in both basis and other file group
-
renamed = signature is unchanged, but the path has moved
-
copyadded = duplicate copy of file was added
-
copydeleted = duplicate copy of file was deleted
-
modified = path is same in both groups, but the signature has changed
-
added = signature and path are only in the other inventor
-
deleted = signature and path are only in the basis inventory
Data Model
-
FileInventoryDifference = compares two FileInventory instances based on file signatures and pathnames
-
FileGroupDifference [1..*] = performs analysis and reports differences between two matching FileGroup objects
-
FileGroupDifferenceSubset [1..5] = collects a set of file-level differences of a give change type
-
FileInstanceDifference [1..*] = contains difference information at the file level
-
FileSignature [1..2] = contains the file signature(s) of two file instances being compared
-
-
-
-
Instance Attribute Summary collapse
-
#added ⇒ Integer
How many files were added.
-
#copyadded ⇒ Integer
How many duplicate copies of files were added.
-
#copydeleted ⇒ Integer
How many duplicate copies of files were deleted.
-
#deleted ⇒ Integer
How many files were deleted.
-
#difference_count ⇒ Integer
The total number of differences found between the two inventories that were compared (dynamically calculated).
-
#group_id ⇒ String
The name of the file group.
-
#identical ⇒ Integer
How many files were unchanged.
-
#modified ⇒ Integer
How many files were modified.
-
#renamed ⇒ Integer
How many files were renamed.
-
#subset_hash ⇒ Hash<Symbol,FileGroupDifferenceSubset>
A set of containers (one for each change type), each of which contains a collection of file-level differences having that change type.
-
#subsets ⇒ Array<FileGroupDifferenceSubset>
A set of Arrays (one for each change type), each of which contains an collection of file-level differences having that change type.
Instance Method Summary collapse
-
#basis_only_keys(basis_hash, other_hash) ⇒ Array
Compare the keys of two hashes and return the keys unique to the first hash.
-
#compare_file_groups(basis_group, other_group) ⇒ FileGroupDifference
Compare two file groups and return a differences report.
-
#compare_matching_signatures(basis_group, other_group) ⇒ FileGroupDifference
For signatures that are present in both groups, report which file instances are identical or renamed.
-
#compare_non_matching_signatures(basis_group, other_group) ⇒ FileGroupDifference
For signatures that are present in only one or the other group, report which file instances are modified, deleted, or added.
-
#file_deltas ⇒ Hash<Symbol,Array>
Sets of filenames grouped by change type for use in performing file or metadata operations.
-
#initialize(opts = {}) ⇒ FileGroupDifference
constructor
A new instance of FileGroupDifference.
-
#matching_keys(basis_hash, other_hash) ⇒ Array
Compare the keys of two hashes and return the intersection.
-
#other_only_keys(basis_hash, other_hash) ⇒ Array
Compare the keys of two hashes and return the keys unique to the second hash.
-
#rename_require_temp_files(filepairs) ⇒ Boolean
Test whether any of the new names are the same as one of the old names, such as would be true for insertion of a new file into a page sequence, or a circular rename.
-
#rename_tempfile_triplets(filepairs) ⇒ Array<Array<String>>
A set of file triples containing oldname, tempname, newname.
-
#subset(change) ⇒ FileGroupDifferenceSubset
Find a specified subset of changes.
-
#summary ⇒ FileGroupDifference
Clone just this element for inclusion in a versionMetadata structure.
-
#summary_fields ⇒ Array<String>
The data fields to include in summary reports.
-
#tabulate_added_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference
Container for reporting the set of file-level differences of type ‘added’.
-
#tabulate_deleted_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference
Container for reporting the set of file-level differences of type ‘deleted’.
-
#tabulate_modified_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference
Container for reporting the set of file-level differences of type ‘modified’.
-
#tabulate_renamed_files(matching_signatures, basis_signature_hash, other_signature_hash) ⇒ FileGroupDifference
Container for reporting the set of file-level differences of type ‘renamed’,‘copyadded’, or ‘copydeleted’.
-
#tabulate_unchanged_files(matching_signatures, basis_signature_hash, other_signature_hash) ⇒ FileGroupDifference
Container for reporting the set of file-level differences of type ‘identical’.
Methods inherited from Serializer::Serializable
#array_to_hash, deep_diff, #diff, #key, #key_name, #to_hash, #to_json, #to_yaml, #variable_names, #variables
Constructor Details
#initialize(opts = {}) ⇒ FileGroupDifference
Returns a new instance of FileGroupDifference.
55 56 57 58 |
# File 'lib/moab/file_group_difference.rb', line 55 def initialize(opts = {}) @subset_hash = Hash.new { |hash, key| hash[key] = FileGroupDifferenceSubset.new(:change => key.to_s) } super(opts) end |
Instance Attribute Details
#added ⇒ Integer
Returns How many files were added.
114 |
# File 'lib/moab/file_group_difference.rb', line 114 attribute :added, Integer, :on_save => proc { |n| n.to_s } |
#copyadded ⇒ Integer
Returns How many duplicate copies of files were added.
86 |
# File 'lib/moab/file_group_difference.rb', line 86 attribute :copyadded, Integer, :on_save => proc { |n| n.to_s } |
#copydeleted ⇒ Integer
Returns How many duplicate copies of files were deleted.
93 |
# File 'lib/moab/file_group_difference.rb', line 93 attribute :copydeleted, Integer, :on_save => proc { |n| n.to_s } |
#deleted ⇒ Integer
Returns How many files were deleted.
121 |
# File 'lib/moab/file_group_difference.rb', line 121 attribute :deleted, Integer, :on_save => proc { |n| n.to_s } |
#difference_count ⇒ Integer
Returns the total number of differences found between the two inventories that were compared (dynamically calculated).
67 |
# File 'lib/moab/file_group_difference.rb', line 67 attribute :difference_count, Integer, :tag => 'differenceCount', :on_save => proc { |i| i.to_s } |
#group_id ⇒ String
Returns The name of the file group.
62 |
# File 'lib/moab/file_group_difference.rb', line 62 attribute :group_id, String, :tag => 'groupId', :key => true |
#identical ⇒ Integer
Returns How many files were unchanged.
79 |
# File 'lib/moab/file_group_difference.rb', line 79 attribute :identical, Integer, :on_save => proc { |n| n.to_s } |
#modified ⇒ Integer
Returns How many files were modified.
107 |
# File 'lib/moab/file_group_difference.rb', line 107 attribute :modified, Integer, :on_save => proc { |n| n.to_s } |
#renamed ⇒ Integer
Returns How many files were renamed.
100 |
# File 'lib/moab/file_group_difference.rb', line 100 attribute :renamed, Integer, :on_save => proc { |n| n.to_s } |
#subset_hash ⇒ Hash<Symbol,FileGroupDifferenceSubset>
Returns A set of containers (one for each change type), each of which contains a collection of file-level differences having that change type.
46 47 48 |
# File 'lib/moab/file_group_difference.rb', line 46 def subset_hash @subset_hash end |
#subsets ⇒ Array<FileGroupDifferenceSubset>
Returns A set of Arrays (one for each change type), each of which contains an collection of file-level differences having that change type.
129 |
# File 'lib/moab/file_group_difference.rb', line 129 has_many :subsets, FileGroupDifferenceSubset, :tag => 'subset' |
Instance Method Details
#basis_only_keys(basis_hash, other_hash) ⇒ Array
Returns Compare the keys of two hashes and return the keys unique to the first hash.
172 173 174 |
# File 'lib/moab/file_group_difference.rb', line 172 def basis_only_keys(basis_hash, other_hash) basis_hash.keys - other_hash.keys end |
#compare_file_groups(basis_group, other_group) ⇒ FileGroupDifference
Returns Compare two file groups and return a differences report.
187 188 189 190 191 192 |
# File 'lib/moab/file_group_difference.rb', line 187 def compare_file_groups(basis_group, other_group) @group_id = basis_group.group_id compare_matching_signatures(basis_group, other_group) compare_non_matching_signatures(basis_group, other_group) self end |
#compare_matching_signatures(basis_group, other_group) ⇒ FileGroupDifference
Returns For signatures that are present in both groups, report which file instances are identical or renamed.
198 199 200 201 202 203 |
# File 'lib/moab/file_group_difference.rb', line 198 def compare_matching_signatures(basis_group, other_group) matching_signatures = matching_keys(basis_group.signature_hash, other_group.signature_hash) tabulate_unchanged_files(matching_signatures, basis_group.signature_hash, other_group.signature_hash) tabulate_renamed_files(matching_signatures, basis_group.signature_hash, other_group.signature_hash) self end |
#compare_non_matching_signatures(basis_group, other_group) ⇒ FileGroupDifference
Returns For signatures that are present in only one or the other group, report which file instances are modified, deleted, or added.
209 210 211 212 213 214 215 216 217 218 |
# File 'lib/moab/file_group_difference.rb', line 209 def compare_non_matching_signatures(basis_group, other_group) basis_only_signatures = basis_only_keys(basis_group.signature_hash, other_group.signature_hash) other_only_signatures = other_only_keys(basis_group.signature_hash, other_group.signature_hash) basis_path_hash = basis_group.path_hash_subset(basis_only_signatures) other_path_hash = other_group.path_hash_subset(other_only_signatures) tabulate_modified_files(basis_path_hash, other_path_hash) tabulate_added_files(basis_path_hash, other_path_hash) tabulate_deleted_files(basis_path_hash, other_path_hash) self end |
#file_deltas ⇒ Hash<Symbol,Array>
Returns Sets of filenames grouped by change type for use in performing file or metadata operations.
334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 |
# File 'lib/moab/file_group_difference.rb', line 334 def file_deltas # The hash to be returned deltas = Hash.new { |hash, key| hash[key] = [] } # case where other_path is empty or 'same'. (create array of strings) %i[identical modified deleted copydeleted].each do |change| deltas[change].concat(subset_hash[change].files.collect(&:basis_path)) end # case where basis_path and other_path are both present. (create array of arrays) %i[copyadded renamed].each do |change| deltas[change].concat(subset_hash[change].files.collect { |file| [file.basis_path, file.other_path] }) end # case where basis_path is empty. (create array of strings) [:added].each do |change| deltas[change].concat(subset_hash[change].files.collect(&:other_path)) end deltas end |
#matching_keys(basis_hash, other_hash) ⇒ Array
Returns Compare the keys of two hashes and return the intersection.
165 166 167 |
# File 'lib/moab/file_group_difference.rb', line 165 def matching_keys(basis_hash, other_hash) basis_hash.keys & other_hash.keys end |
#other_only_keys(basis_hash, other_hash) ⇒ Array
Returns Compare the keys of two hashes and return the keys unique to the second hash.
179 180 181 |
# File 'lib/moab/file_group_difference.rb', line 179 def other_only_keys(basis_hash, other_hash) other_hash.keys - basis_hash.keys end |
#rename_require_temp_files(filepairs) ⇒ Boolean
Returns Test whether any of the new names are the same as one of the old names, such as would be true for insertion of a new file into a page sequence, or a circular rename. In such a case, return true, indicating that use of intermediate temporary files would be required when updating a copy of an object’s files at a given location.
357 358 359 360 361 362 363 364 365 366 367 368 |
# File 'lib/moab/file_group_difference.rb', line 357 def rename_require_temp_files(filepairs) # Split the filepairs into two arrays oldnames = [] newnames = [] filepairs.each do |old, new| oldnames << old newnames << new end # Are any of the filenames the same in set of oldnames and set of newnames? intersection = oldnames & newnames intersection.count > 0 end |
#rename_tempfile_triplets(filepairs) ⇒ Array<Array<String>>
Returns a set of file triples containing oldname, tempname, newname.
372 373 374 |
# File 'lib/moab/file_group_difference.rb', line 372 def rename_tempfile_triplets(filepairs) filepairs.collect { |old, new| [old, new, "#{new}-#{Time.now.strftime('%Y%m%d%H%H%S')}-tmp"] } end |
#subset(change) ⇒ FileGroupDifferenceSubset
Returns Find a specified subset of changes.
50 51 52 |
# File 'lib/moab/file_group_difference.rb', line 50 def subset(change) subset_hash[change.to_sym] end |
#summary ⇒ FileGroupDifference
Returns Clone just this element for inclusion in a versionMetadata structure.
148 149 150 151 152 153 154 155 156 157 158 159 |
# File 'lib/moab/file_group_difference.rb', line 148 def summary FileGroupDifference.new( :group_id => group_id, :identical => identical, :copyadded => copyadded, :copydeleted => copydeleted, :renamed => renamed, :modified => modified, :added => added, :deleted => deleted ) end |
#summary_fields ⇒ Array<String>
Returns The data fields to include in summary reports.
142 143 144 |
# File 'lib/moab/file_group_difference.rb', line 142 def summary_fields %w[group_id difference_count identical copyadded copydeleted renamed modified deleted added] end |
#tabulate_added_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference
Returns Container for reporting the set of file-level differences of type ‘added’.
304 305 306 307 308 309 310 311 312 313 |
# File 'lib/moab/file_group_difference.rb', line 304 def tabulate_added_files(basis_path_hash, other_path_hash) other_only_keys(basis_path_hash, other_path_hash).each do |path| fid = FileInstanceDifference.new(:change => 'added') fid.basis_path = "" fid.other_path = path fid.signatures << other_path_hash[path] subset_hash[:added].files << fid end self end |
#tabulate_deleted_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference
Returns Container for reporting the set of file-level differences of type ‘deleted’.
322 323 324 325 326 327 328 329 330 331 |
# File 'lib/moab/file_group_difference.rb', line 322 def tabulate_deleted_files(basis_path_hash, other_path_hash) basis_only_keys(basis_path_hash, other_path_hash).each do |path| fid = FileInstanceDifference.new(:change => 'deleted') fid.basis_path = path fid.other_path = "" fid.signatures << basis_path_hash[path] subset_hash[:deleted].files << fid end self end |
#tabulate_modified_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference
Returns Container for reporting the set of file-level differences of type ‘modified’.
285 286 287 288 289 290 291 292 293 294 295 |
# File 'lib/moab/file_group_difference.rb', line 285 def tabulate_modified_files(basis_path_hash, other_path_hash) matching_keys(basis_path_hash, other_path_hash).each do |path| fid = FileInstanceDifference.new(:change => 'modified') fid.basis_path = path fid.other_path = "same" fid.signatures << basis_path_hash[path] fid.signatures << other_path_hash[path] subset_hash[:modified].files << fid end self end |
#tabulate_renamed_files(matching_signatures, basis_signature_hash, other_signature_hash) ⇒ FileGroupDifference
Returns Container for reporting the set of file-level differences of type ‘renamed’,‘copyadded’, or ‘copydeleted’.
252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 |
# File 'lib/moab/file_group_difference.rb', line 252 def tabulate_renamed_files(matching_signatures, basis_signature_hash, other_signature_hash) matching_signatures.each do |signature| basis_paths = basis_signature_hash[signature].paths other_paths = other_signature_hash[signature].paths basis_only_paths = basis_paths - other_paths other_only_paths = other_paths - basis_paths maxsize = [basis_only_paths.size, other_only_paths.size].max (0..maxsize - 1).each do |n| fid = FileInstanceDifference.new fid.basis_path = basis_only_paths[n] fid.other_path = other_only_paths[n] fid.signatures << signature if fid.basis_path.nil? fid.change = 'copyadded' fid.basis_path = basis_paths[0] elsif fid.other_path.nil? fid.change = 'copydeleted' else fid.change = 'renamed' end subset_hash[fid.change.to_sym].files << fid end end self end |
#tabulate_unchanged_files(matching_signatures, basis_signature_hash, other_signature_hash) ⇒ FileGroupDifference
Returns Container for reporting the set of file-level differences of type ‘identical’.
228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 |
# File 'lib/moab/file_group_difference.rb', line 228 def tabulate_unchanged_files(matching_signatures, basis_signature_hash, other_signature_hash) matching_signatures.each do |signature| basis_paths = basis_signature_hash[signature].paths other_paths = other_signature_hash[signature].paths matching_paths = basis_paths & other_paths matching_paths.each do |path| fid = FileInstanceDifference.new(:change => 'identical') fid.basis_path = path fid.other_path = "same" fid.signatures << signature subset_hash[:identical].files << fid end end self end |