Class: Moab::FileGroupDifference

Inherits:
Serializer::Serializable show all
Includes:
HappyMapper
Defined in:
lib/moab/file_group_difference.rb

Overview

Note:

Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University. All rights reserved. See LICENSE for details.

Performs analysis and reports the differences between two matching FileGroup objects. The descending elements of the report hold a detailed breakdown of file-level differences, organized by change type. This stanza is a child element of FileInventoryDifference, the documentation of which contains a full example.

In order to determine the detailed nature of the differences that are present between the two manifests, this algorithm first compares the sets of file signatures present in the groups being compared, then uses the result of that operation for subsequent analysis of filename correspondences.

For the first step, a Ruby Hash is extracted from each of the of the two groups, with an array of FileSignature object used as hash keys, and the corresponding FileInstance arrays as the hash values. The set of keys from the basis hash can be compared against the keys from the other hash using Array operators:

  • matching = basis_array & other_array

  • basis_only = basis_array - other_array

  • other_only = other_array - basis_array

For the second step of the comparison, the matching and non-matching sets of hash entries are further categorized as follows:

  • identical = signature and file path is the same in both basis and other file group

  • renamed = signature is unchanged, but the path has moved

  • copyadded = duplicate copy of file was added

  • copydeleted = duplicate copy of file was deleted

  • modified = path is same in both groups, but the signature has changed

  • added = signature and path are only in the other inventor

  • deleted = signature and path are only in the basis inventory

Data Model

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods inherited from Serializer::Serializable

#array_to_hash, deep_diff, #diff, #key, #key_name, #to_hash, #to_json, #to_yaml, #variable_names, #variables

Constructor Details

#initialize(opts = {}) ⇒ FileGroupDifference

Returns a new instance of FileGroupDifference.



55
56
57
58
# File 'lib/moab/file_group_difference.rb', line 55

def initialize(opts = {})
  @subset_hash = Hash.new { |hash, key| hash[key] = FileGroupDifferenceSubset.new(:change => key.to_s) }
  super(opts)
end

Instance Attribute Details

#addedInteger

Returns How many files were added.

Returns:

  • (Integer)

    How many files were added



114
# File 'lib/moab/file_group_difference.rb', line 114

attribute :added, Integer, :on_save => proc { |n| n.to_s }

#copyaddedInteger

Returns How many duplicate copies of files were added.

Returns:

  • (Integer)

    How many duplicate copies of files were added



86
# File 'lib/moab/file_group_difference.rb', line 86

attribute :copyadded, Integer, :on_save => proc { |n| n.to_s }

#copydeletedInteger

Returns How many duplicate copies of files were deleted.

Returns:

  • (Integer)

    How many duplicate copies of files were deleted



93
# File 'lib/moab/file_group_difference.rb', line 93

attribute :copydeleted, Integer, :on_save => proc { |n| n.to_s }

#deletedInteger

Returns How many files were deleted.

Returns:

  • (Integer)

    How many files were deleted



121
# File 'lib/moab/file_group_difference.rb', line 121

attribute :deleted, Integer, :on_save => proc { |n| n.to_s }

#difference_countInteger

Returns the total number of differences found between the two inventories that were compared (dynamically calculated).

Returns:

  • (Integer)

    the total number of differences found between the two inventories that were compared (dynamically calculated)



67
# File 'lib/moab/file_group_difference.rb', line 67

attribute :difference_count, Integer, :tag => 'differenceCount', :on_save => proc { |i| i.to_s }

#group_idString

Returns The name of the file group.

Returns:

  • (String)

    The name of the file group



62
# File 'lib/moab/file_group_difference.rb', line 62

attribute :group_id, String, :tag => 'groupId', :key => true

#identicalInteger

Returns How many files were unchanged.

Returns:

  • (Integer)

    How many files were unchanged



79
# File 'lib/moab/file_group_difference.rb', line 79

attribute :identical, Integer, :on_save => proc { |n| n.to_s }

#modifiedInteger

Returns How many files were modified.

Returns:

  • (Integer)

    How many files were modified



107
# File 'lib/moab/file_group_difference.rb', line 107

attribute :modified, Integer, :on_save => proc { |n| n.to_s }

#renamedInteger

Returns How many files were renamed.

Returns:

  • (Integer)

    How many files were renamed



100
# File 'lib/moab/file_group_difference.rb', line 100

attribute :renamed, Integer, :on_save => proc { |n| n.to_s }

#subset_hashHash<Symbol,FileGroupDifferenceSubset>

Returns A set of containers (one for each change type), each of which contains a collection of file-level differences having that change type.

Returns:

  • (Hash<Symbol,FileGroupDifferenceSubset>)

    A set of containers (one for each change type), each of which contains a collection of file-level differences having that change type.



46
47
48
# File 'lib/moab/file_group_difference.rb', line 46

def subset_hash
  @subset_hash
end

#subsetsArray<FileGroupDifferenceSubset>

Returns A set of Arrays (one for each change type), each of which contains an collection of file-level differences having that change type.

Returns:

  • (Array<FileGroupDifferenceSubset>)

    A set of Arrays (one for each change type), each of which contains an collection of file-level differences having that change type.



129
# File 'lib/moab/file_group_difference.rb', line 129

has_many :subsets, FileGroupDifferenceSubset, :tag => 'subset'

Instance Method Details

#basis_only_keys(basis_hash, other_hash) ⇒ Array

Returns Compare the keys of two hashes and return the keys unique to the first hash.

Parameters:

  • basis_hash (Hash)

    The first hash being compared

  • other_hash (Hash)

    The second hash being compared

Returns:

  • (Array)

    Compare the keys of two hashes and return the keys unique to the first hash



172
173
174
# File 'lib/moab/file_group_difference.rb', line 172

def basis_only_keys(basis_hash, other_hash)
  basis_hash.keys - other_hash.keys
end

#compare_file_groups(basis_group, other_group) ⇒ FileGroupDifference

Returns Compare two file groups and return a differences report.

Parameters:

  • basis_group (FileGroup)

    The file group that is the basis of the comparison

  • other_group (FileGroup)

    The file group that is compared against the basis group

Returns:



187
188
189
190
191
192
# File 'lib/moab/file_group_difference.rb', line 187

def compare_file_groups(basis_group, other_group)
  @group_id = basis_group.group_id
  compare_matching_signatures(basis_group, other_group)
  compare_non_matching_signatures(basis_group, other_group)
  self
end

#compare_matching_signatures(basis_group, other_group) ⇒ FileGroupDifference

Returns For signatures that are present in both groups, report which file instances are identical or renamed.

Parameters:

  • basis_group (FileGroup)

    The file group that is the basis of the comparison

  • other_group (FileGroup)

    The file group that is compared against the basis group

Returns:

  • (FileGroupDifference)

    For signatures that are present in both groups, report which file instances are identical or renamed



198
199
200
201
202
203
# File 'lib/moab/file_group_difference.rb', line 198

def compare_matching_signatures(basis_group, other_group)
  matching_signatures = matching_keys(basis_group.signature_hash, other_group.signature_hash)
  tabulate_unchanged_files(matching_signatures, basis_group.signature_hash, other_group.signature_hash)
  tabulate_renamed_files(matching_signatures, basis_group.signature_hash, other_group.signature_hash)
  self
end

#compare_non_matching_signatures(basis_group, other_group) ⇒ FileGroupDifference

Returns For signatures that are present in only one or the other group, report which file instances are modified, deleted, or added.

Parameters:

  • basis_group (FileGroup)

    The file group that is the basis of the comparison

  • other_group (FileGroup)

    The file group that is compared against the basis group

Returns:

  • (FileGroupDifference)

    For signatures that are present in only one or the other group, report which file instances are modified, deleted, or added



209
210
211
212
213
214
215
216
217
218
# File 'lib/moab/file_group_difference.rb', line 209

def compare_non_matching_signatures(basis_group, other_group)
  basis_only_signatures = basis_only_keys(basis_group.signature_hash, other_group.signature_hash)
  other_only_signatures = other_only_keys(basis_group.signature_hash, other_group.signature_hash)
  basis_path_hash = basis_group.path_hash_subset(basis_only_signatures)
  other_path_hash = other_group.path_hash_subset(other_only_signatures)
  tabulate_modified_files(basis_path_hash, other_path_hash)
  tabulate_added_files(basis_path_hash, other_path_hash)
  tabulate_deleted_files(basis_path_hash, other_path_hash)
  self
end

#file_deltasHash<Symbol,Array>

Returns Sets of filenames grouped by change type for use in performing file or metadata operations.

Returns:

  • (Hash<Symbol,Array>)

    Sets of filenames grouped by change type for use in performing file or metadata operations



334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
# File 'lib/moab/file_group_difference.rb', line 334

def file_deltas
  # The hash to be returned
  deltas = Hash.new { |hash, key| hash[key] = [] }
  # case where other_path is empty or 'same'.  (create array of strings)
  %i[identical modified deleted copydeleted].each do |change|
    deltas[change].concat(subset_hash[change].files.collect(&:basis_path))
  end
  # case where basis_path and other_path are both present.  (create array of arrays)
  %i[copyadded renamed].each do |change|
    deltas[change].concat(subset_hash[change].files.collect { |file| [file.basis_path, file.other_path] })
  end
  # case where basis_path is empty.  (create array of strings)
  [:added].each do |change|
    deltas[change].concat(subset_hash[change].files.collect(&:other_path))
  end
  deltas
end

#matching_keys(basis_hash, other_hash) ⇒ Array

Returns Compare the keys of two hashes and return the intersection.

Parameters:

  • basis_hash (Hash)

    The first hash being compared

  • other_hash (Hash)

    The second hash being compared

Returns:

  • (Array)

    Compare the keys of two hashes and return the intersection



165
166
167
# File 'lib/moab/file_group_difference.rb', line 165

def matching_keys(basis_hash, other_hash)
  basis_hash.keys & other_hash.keys
end

#other_only_keys(basis_hash, other_hash) ⇒ Array

Returns Compare the keys of two hashes and return the keys unique to the second hash.

Parameters:

  • basis_hash (Hash)

    The first hash being compared

  • other_hash (Hash)

    The second hash being compared

Returns:

  • (Array)

    Compare the keys of two hashes and return the keys unique to the second hash



179
180
181
# File 'lib/moab/file_group_difference.rb', line 179

def other_only_keys(basis_hash, other_hash)
  other_hash.keys - basis_hash.keys
end

#rename_require_temp_files(filepairs) ⇒ Boolean

Returns Test whether any of the new names are the same as one of the old names, such as would be true for insertion of a new file into a page sequence, or a circular rename. In such a case, return true, indicating that use of intermediate temporary files would be required when updating a copy of an object’s files at a given location.

Parameters:

  • filepairs (Array<Array<String>>)

    The set of oldname, newname pairs for all files being renamed

Returns:

  • (Boolean)

    Test whether any of the new names are the same as one of the old names, such as would be true for insertion of a new file into a page sequence, or a circular rename. In such a case, return true, indicating that use of intermediate temporary files would be required when updating a copy of an object’s files at a given location.



357
358
359
360
361
362
363
364
365
366
367
368
# File 'lib/moab/file_group_difference.rb', line 357

def rename_require_temp_files(filepairs)
  # Split the filepairs into two arrays
  oldnames = []
  newnames = []
  filepairs.each do |old, new|
    oldnames << old
    newnames << new
  end
  # Are any of the filenames the same in set of oldnames and set of newnames?
  intersection = oldnames & newnames
  intersection.count > 0
end

#rename_tempfile_triplets(filepairs) ⇒ Array<Array<String>>

Returns a set of file triples containing oldname, tempname, newname.

Parameters:

  • filepairs (Array<Array<String>>)

    The set of oldname, newname pairs for all files being renamed

Returns:

  • (Array<Array<String>>)

    a set of file triples containing oldname, tempname, newname



372
373
374
# File 'lib/moab/file_group_difference.rb', line 372

def rename_tempfile_triplets(filepairs)
  filepairs.collect { |old, new| [old, new, "#{new}-#{Time.now.strftime('%Y%m%d%H%H%S')}-tmp"] }
end

#subset(change) ⇒ FileGroupDifferenceSubset

Returns Find a specified subset of changes.

Parameters:

  • change (String)

    the change type to search for

Returns:



50
51
52
# File 'lib/moab/file_group_difference.rb', line 50

def subset(change)
  subset_hash[change.to_sym]
end

#summaryFileGroupDifference

Returns Clone just this element for inclusion in a versionMetadata structure.

Returns:

  • (FileGroupDifference)

    Clone just this element for inclusion in a versionMetadata structure



148
149
150
151
152
153
154
155
156
157
158
159
# File 'lib/moab/file_group_difference.rb', line 148

def summary
  FileGroupDifference.new(
    :group_id => group_id,
    :identical => identical,
    :copyadded => copyadded,
    :copydeleted => copydeleted,
    :renamed => renamed,
    :modified => modified,
    :added => added,
    :deleted => deleted
  )
end

#summary_fieldsArray<String>

Returns The data fields to include in summary reports.

Returns:

  • (Array<String>)

    The data fields to include in summary reports



142
143
144
# File 'lib/moab/file_group_difference.rb', line 142

def summary_fields
  %w[group_id difference_count identical copyadded copydeleted renamed modified deleted added]
end

#tabulate_added_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference

Returns Container for reporting the set of file-level differences of type ‘added’.

Parameters:

  • basis_path_hash (Hash<String,FileSignature>)

    The file paths and associated signatures for manifestations appearing only in the basis group

  • other_path_hash (Hash<String,FileSignature>)

    The file paths and associated signatures for manifestations appearing only in the other group

Returns:

  • (FileGroupDifference)

    Container for reporting the set of file-level differences of type ‘added’



304
305
306
307
308
309
310
311
312
313
# File 'lib/moab/file_group_difference.rb', line 304

def tabulate_added_files(basis_path_hash, other_path_hash)
  other_only_keys(basis_path_hash, other_path_hash).each do |path|
    fid = FileInstanceDifference.new(:change => 'added')
    fid.basis_path = ""
    fid.other_path = path
    fid.signatures << other_path_hash[path]
    subset_hash[:added].files << fid
  end
  self
end

#tabulate_deleted_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference

Returns Container for reporting the set of file-level differences of type ‘deleted’.

Parameters:

  • basis_path_hash (Hash<String,FileSignature>)

    The file paths and associated signatures for manifestations appearing only in the basis group

  • other_path_hash (Hash<String,FileSignature>)

    The file paths and associated signatures for manifestations appearing only in the other group

Returns:

  • (FileGroupDifference)

    Container for reporting the set of file-level differences of type ‘deleted’



322
323
324
325
326
327
328
329
330
331
# File 'lib/moab/file_group_difference.rb', line 322

def tabulate_deleted_files(basis_path_hash, other_path_hash)
  basis_only_keys(basis_path_hash, other_path_hash).each do |path|
    fid = FileInstanceDifference.new(:change => 'deleted')
    fid.basis_path = path
    fid.other_path = ""
    fid.signatures << basis_path_hash[path]
    subset_hash[:deleted].files << fid
  end
  self
end

#tabulate_modified_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference

Returns Container for reporting the set of file-level differences of type ‘modified’.

Parameters:

  • basis_path_hash (Hash<String,FileSignature>)

    The file paths and associated signatures for manifestations appearing only in the basis group

  • other_path_hash (Hash<String,FileSignature>)

    The file paths and associated signatures for manifestations appearing only in the other group

Returns:

  • (FileGroupDifference)

    Container for reporting the set of file-level differences of type ‘modified’



285
286
287
288
289
290
291
292
293
294
295
# File 'lib/moab/file_group_difference.rb', line 285

def tabulate_modified_files(basis_path_hash, other_path_hash)
  matching_keys(basis_path_hash, other_path_hash).each do |path|
    fid = FileInstanceDifference.new(:change => 'modified')
    fid.basis_path = path
    fid.other_path = "same"
    fid.signatures << basis_path_hash[path]
    fid.signatures << other_path_hash[path]
    subset_hash[:modified].files << fid
  end
  self
end

#tabulate_renamed_files(matching_signatures, basis_signature_hash, other_signature_hash) ⇒ FileGroupDifference

Returns Container for reporting the set of file-level differences of type ‘renamed’,‘copyadded’, or ‘copydeleted’.

Parameters:

  • matching_signatures (Array<FileSignature>)

    The file signature of the file manifestations being compared

  • basis_signature_hash (Hash<FileSignature, FileManifestation>)

    Signature to file path mapping from the file group that is the basis of the comparison

  • other_signature_hash (Hash<FileSignature, FileManifestation>)

    Signature to file path mapping from the file group that is the being compared to the basis group

Returns:

  • (FileGroupDifference)

    Container for reporting the set of file-level differences of type ‘renamed’,‘copyadded’, or ‘copydeleted’



252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
# File 'lib/moab/file_group_difference.rb', line 252

def tabulate_renamed_files(matching_signatures, basis_signature_hash, other_signature_hash)
  matching_signatures.each do |signature|
    basis_paths = basis_signature_hash[signature].paths
    other_paths = other_signature_hash[signature].paths
    basis_only_paths = basis_paths - other_paths
    other_only_paths = other_paths - basis_paths
    maxsize = [basis_only_paths.size, other_only_paths.size].max
    (0..maxsize - 1).each do |n|
      fid = FileInstanceDifference.new
      fid.basis_path = basis_only_paths[n]
      fid.other_path = other_only_paths[n]
      fid.signatures << signature
      if fid.basis_path.nil?
        fid.change = 'copyadded'
        fid.basis_path = basis_paths[0]
      elsif fid.other_path.nil?
        fid.change = 'copydeleted'
      else
        fid.change = 'renamed'
      end
      subset_hash[fid.change.to_sym].files << fid
    end
  end
  self
end

#tabulate_unchanged_files(matching_signatures, basis_signature_hash, other_signature_hash) ⇒ FileGroupDifference

Returns Container for reporting the set of file-level differences of type ‘identical’.

Parameters:

  • matching_signatures (Array<FileSignature>)

    The file signature of the file manifestations being compared

  • basis_signature_hash (Hash<FileSignature, FileManifestation>)

    Signature to file path mapping from the file group that is the basis of the comparison

  • other_signature_hash (Hash<FileSignature, FileManifestation>)

    Signature to file path mapping from the file group that is the being compared to the basis group

Returns:

  • (FileGroupDifference)

    Container for reporting the set of file-level differences of type ‘identical’



228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
# File 'lib/moab/file_group_difference.rb', line 228

def tabulate_unchanged_files(matching_signatures, basis_signature_hash, other_signature_hash)
  matching_signatures.each do |signature|
    basis_paths = basis_signature_hash[signature].paths
    other_paths = other_signature_hash[signature].paths
    matching_paths = basis_paths & other_paths
    matching_paths.each do |path|
      fid = FileInstanceDifference.new(:change => 'identical')
      fid.basis_path = path
      fid.other_path = "same"
      fid.signatures << signature
      subset_hash[:identical].files << fid
    end
  end
  self
end