Class: Moab::FileGroupDifference

Inherits:
Serializer::Serializable show all
Includes:
HappyMapper
Defined in:
lib/moab/file_group_difference.rb

Overview

Note:

Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University. All rights reserved. See LICENSE for details.

Performs analysis and reports the differences between two matching FileGroup objects. The descending elements of the report hold a detailed breakdown of file-level differences, organized by change type. This stanza is a child element of FileInventoryDifference, the documentation of which contains a full example.

In order to determine the detailed nature of the differences that are present between the two manifests, this algorithm first compares the sets of file signatures present in the groups being compared, then uses the result of that operation for subsequent analysis of filename correspondences.

For the first step, a Ruby Hash is extracted from each of the of the two groups, with an array of FileSignature object used as hash keys, and the corresponding FileInstance arrays as the hash values. The set of keys from the basis hash can be compared against the keys from the other hash using Array operators:

  • matching = basis_array & other_array

  • basis_only = basis_array - other_array

  • other_only = other_array - basis_array

For the second step of the comparison, the matching and non-matching sets of hash entries are further categorized as follows:

  • identical = signature and file path is the same in both basis and other file group

  • renamed = signature is unchanged, but the path has moved

  • copyadded = duplicate copy of file was added

  • copydeleted = duplicate copy of file was deleted

  • modified = path is same in both groups, but the signature has changed

  • added = signature and path are only in the other inventor

  • deleted = signature and path are only in the basis inventory

Data Model

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods inherited from Serializer::Serializable

#array_to_hash, deep_diff, #diff, #key, #key_name, #to_hash, #to_json, #to_yaml, #variable_names, #variables

Constructor Details

#initialize(opts = {}) ⇒ FileGroupDifference

Returns a new instance of FileGroupDifference.



53
54
55
56
# File 'lib/moab/file_group_difference.rb', line 53

def initialize(opts = {})
  @subset_hash = Hash.new { |hash, key| hash[key] = FileGroupDifferenceSubset.new(:change => key.to_s) }
  super(opts)
end

Instance Attribute Details

#addedInteger

Returns How many files were added.

Returns:

  • (Integer)

    How many files were added



112
# File 'lib/moab/file_group_difference.rb', line 112

attribute :added, Integer, :on_save => proc { |n| n.to_s }

#copyaddedInteger

Returns How many duplicate copies of files were added.

Returns:

  • (Integer)

    How many duplicate copies of files were added



84
# File 'lib/moab/file_group_difference.rb', line 84

attribute :copyadded, Integer, :on_save => proc { |n| n.to_s }

#copydeletedInteger

Returns How many duplicate copies of files were deleted.

Returns:

  • (Integer)

    How many duplicate copies of files were deleted



91
# File 'lib/moab/file_group_difference.rb', line 91

attribute :copydeleted, Integer, :on_save => proc { |n| n.to_s }

#deletedInteger

Returns How many files were deleted.

Returns:

  • (Integer)

    How many files were deleted



119
# File 'lib/moab/file_group_difference.rb', line 119

attribute :deleted, Integer, :on_save => proc { |n| n.to_s }

#difference_countInteger

Returns the total number of differences found between the two inventories that were compared (dynamically calculated).

Returns:

  • (Integer)

    the total number of differences found between the two inventories that were compared (dynamically calculated)



65
# File 'lib/moab/file_group_difference.rb', line 65

attribute :difference_count, Integer, :tag => 'differenceCount', :on_save => proc { |i| i.to_s }

#group_idString

Returns The name of the file group.

Returns:

  • (String)

    The name of the file group



60
# File 'lib/moab/file_group_difference.rb', line 60

attribute :group_id, String, :tag => 'groupId', :key => true

#identicalInteger

Returns How many files were unchanged.

Returns:

  • (Integer)

    How many files were unchanged



77
# File 'lib/moab/file_group_difference.rb', line 77

attribute :identical, Integer, :on_save => proc { |n| n.to_s }

#modifiedInteger

Returns How many files were modified.

Returns:

  • (Integer)

    How many files were modified



105
# File 'lib/moab/file_group_difference.rb', line 105

attribute :modified, Integer, :on_save => proc { |n| n.to_s }

#renamedInteger

Returns How many files were renamed.

Returns:

  • (Integer)

    How many files were renamed



98
# File 'lib/moab/file_group_difference.rb', line 98

attribute :renamed, Integer, :on_save => proc { |n| n.to_s }

#subset_hashHash<Symbol,FileGroupDifferenceSubset>

Returns A set of containers (one for each change type), each of which contains a collection of file-level differences having that change type.

Returns:

  • (Hash<Symbol,FileGroupDifferenceSubset>)

    A set of containers (one for each change type), each of which contains a collection of file-level differences having that change type.



44
45
46
# File 'lib/moab/file_group_difference.rb', line 44

def subset_hash
  @subset_hash
end

#subsetsArray<FileGroupDifferenceSubset>

Returns A set of Arrays (one for each change type), each of which contains an collection of file-level differences having that change type.

Returns:

  • (Array<FileGroupDifferenceSubset>)

    A set of Arrays (one for each change type), each of which contains an collection of file-level differences having that change type.



127
# File 'lib/moab/file_group_difference.rb', line 127

has_many :subsets, FileGroupDifferenceSubset, :tag => 'subset'

Instance Method Details

#basis_only_keys(basis_hash, other_hash) ⇒ Array

Returns Compare the keys of two hashes and return the keys unique to the first hash.

Parameters:

  • basis_hash (Hash)

    The first hash being compared

  • other_hash (Hash)

    The second hash being compared

Returns:

  • (Array)

    Compare the keys of two hashes and return the keys unique to the first hash



169
170
171
# File 'lib/moab/file_group_difference.rb', line 169

def basis_only_keys(basis_hash, other_hash)
  basis_hash.keys - other_hash.keys
end

#compare_file_groups(basis_group, other_group) ⇒ FileGroupDifference

Returns Compare two file groups and return a differences report.

Parameters:

  • basis_group (FileGroup)

    The file group that is the basis of the comparison

  • other_group (FileGroup)

    The file group that is compared against the basis group

Returns:



184
185
186
187
188
189
# File 'lib/moab/file_group_difference.rb', line 184

def compare_file_groups(basis_group, other_group)
  @group_id = basis_group.group_id
  compare_matching_signatures(basis_group, other_group)
  compare_non_matching_signatures(basis_group, other_group)
  self
end

#compare_matching_signatures(basis_group, other_group) ⇒ FileGroupDifference

Returns For signatures that are present in both groups, report which file instances are identical or renamed.

Parameters:

  • basis_group (FileGroup)

    The file group that is the basis of the comparison

  • other_group (FileGroup)

    The file group that is compared against the basis group

Returns:

  • (FileGroupDifference)

    For signatures that are present in both groups, report which file instances are identical or renamed



195
196
197
198
199
200
# File 'lib/moab/file_group_difference.rb', line 195

def compare_matching_signatures(basis_group, other_group)
  matching_signatures = matching_keys(basis_group.signature_hash, other_group.signature_hash)
  tabulate_unchanged_files(matching_signatures, basis_group.signature_hash, other_group.signature_hash)
  tabulate_renamed_files(matching_signatures, basis_group.signature_hash, other_group.signature_hash)
  self
end

#compare_non_matching_signatures(basis_group, other_group) ⇒ FileGroupDifference

Returns For signatures that are present in only one or the other group, report which file instances are modified, deleted, or added.

Parameters:

  • basis_group (FileGroup)

    The file group that is the basis of the comparison

  • other_group (FileGroup)

    The file group that is compared against the basis group

Returns:

  • (FileGroupDifference)

    For signatures that are present in only one or the other group, report which file instances are modified, deleted, or added



206
207
208
209
210
211
212
213
214
215
# File 'lib/moab/file_group_difference.rb', line 206

def compare_non_matching_signatures(basis_group, other_group)
  basis_only_signatures = basis_only_keys(basis_group.signature_hash, other_group.signature_hash)
  other_only_signatures = other_only_keys(basis_group.signature_hash, other_group.signature_hash)
  basis_path_hash = basis_group.path_hash_subset(basis_only_signatures)
  other_path_hash = other_group.path_hash_subset(other_only_signatures)
  tabulate_modified_files(basis_path_hash, other_path_hash)
  tabulate_added_files(basis_path_hash, other_path_hash)
  tabulate_deleted_files(basis_path_hash, other_path_hash)
  self
end

#file_deltasHash<Symbol,Array>

Returns Sets of filenames grouped by change type for use in performing file or metadata operations.

Returns:

  • (Hash<Symbol,Array>)

    Sets of filenames grouped by change type for use in performing file or metadata operations



331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
# File 'lib/moab/file_group_difference.rb', line 331

def file_deltas
  # The hash to be returned
  deltas = Hash.new { |hash, key| hash[key] = [] }
  # case where other_path is empty or 'same'.  (create array of strings)
  %i[identical modified deleted copydeleted].each do |change|
    deltas[change].concat(subset_hash[change].files.collect(&:basis_path))
  end
  # case where basis_path and other_path are both present.  (create array of arrays)
  %i[copyadded renamed].each do |change|
    deltas[change].concat(subset_hash[change].files.collect { |file| [file.basis_path, file.other_path] })
  end
  # case where basis_path is empty.  (create array of strings)
  [:added].each do |change|
    deltas[change].concat(subset_hash[change].files.collect(&:other_path))
  end
  deltas
end

#matching_keys(basis_hash, other_hash) ⇒ Array

Returns Compare the keys of two hashes and return the intersection.

Parameters:

  • basis_hash (Hash)

    The first hash being compared

  • other_hash (Hash)

    The second hash being compared

Returns:

  • (Array)

    Compare the keys of two hashes and return the intersection



162
163
164
# File 'lib/moab/file_group_difference.rb', line 162

def matching_keys(basis_hash, other_hash)
  basis_hash.keys & other_hash.keys
end

#other_only_keys(basis_hash, other_hash) ⇒ Array

Returns Compare the keys of two hashes and return the keys unique to the second hash.

Parameters:

  • basis_hash (Hash)

    The first hash being compared

  • other_hash (Hash)

    The second hash being compared

Returns:

  • (Array)

    Compare the keys of two hashes and return the keys unique to the second hash



176
177
178
# File 'lib/moab/file_group_difference.rb', line 176

def other_only_keys(basis_hash, other_hash)
  other_hash.keys - basis_hash.keys
end

#rename_require_temp_files(filepairs) ⇒ Boolean

Returns Test whether any of the new names are the same as one of the old names, such as would be true for insertion of a new file into a page sequence, or a circular rename. In such a case, return true, indicating that use of intermediate temporary files would be required when updating a copy of an object’s files at a given location.

Parameters:

  • filepairs (Array<Array<String>>)

    The set of oldname, newname pairs for all files being renamed

Returns:

  • (Boolean)

    Test whether any of the new names are the same as one of the old names, such as would be true for insertion of a new file into a page sequence, or a circular rename. In such a case, return true, indicating that use of intermediate temporary files would be required when updating a copy of an object’s files at a given location.



354
355
356
357
358
359
360
361
362
363
364
365
# File 'lib/moab/file_group_difference.rb', line 354

def rename_require_temp_files(filepairs)
  # Split the filepairs into two arrays
  oldnames = []
  newnames = []
  filepairs.each do |old, new|
    oldnames << old
    newnames << new
  end
  # Are any of the filenames the same in set of oldnames and set of newnames?
  intersection = oldnames & newnames
  intersection.count > 0
end

#rename_tempfile_triplets(filepairs) ⇒ Array<Array<String>>

Returns a set of file triples containing oldname, tempname, newname.

Parameters:

  • filepairs (Array<Array<String>>)

    The set of oldname, newname pairs for all files being renamed

Returns:

  • (Array<Array<String>>)

    a set of file triples containing oldname, tempname, newname



369
370
371
# File 'lib/moab/file_group_difference.rb', line 369

def rename_tempfile_triplets(filepairs)
  filepairs.collect { |old, new| [old, new, "#{new}-#{Time.now.strftime('%Y%m%d%H%H%S')}-tmp"] }
end

#subset(change) ⇒ FileGroupDifferenceSubset

Returns Find a specified subset of changes.

Parameters:

  • change (String)

    the change type to search for

Returns:



48
49
50
# File 'lib/moab/file_group_difference.rb', line 48

def subset(change)
  subset_hash[change.to_sym]
end

#summaryFileGroupDifference

Returns Clone just this element for inclusion in a versionMetadata structure.

Returns:

  • (FileGroupDifference)

    Clone just this element for inclusion in a versionMetadata structure



145
146
147
148
149
150
151
152
153
154
155
156
# File 'lib/moab/file_group_difference.rb', line 145

def summary
  FileGroupDifference.new(
    :group_id => group_id,
    :identical => identical,
    :copyadded => copyadded,
    :copydeleted => copydeleted,
    :renamed => renamed,
    :modified => modified,
    :added => added,
    :deleted => deleted
  )
end

#summary_fieldsArray<String>

Returns The data fields to include in summary reports.

Returns:

  • (Array<String>)

    The data fields to include in summary reports



139
140
141
# File 'lib/moab/file_group_difference.rb', line 139

def summary_fields
  %w[group_id difference_count identical copyadded copydeleted renamed modified deleted added]
end

#tabulate_added_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference

Returns Container for reporting the set of file-level differences of type ‘added’.

Parameters:

  • basis_path_hash (Hash<String,FileSignature>)

    The file paths and associated signatures for manifestations appearing only in the basis group

  • other_path_hash (Hash<String,FileSignature>)

    The file paths and associated signatures for manifestations appearing only in the other group

Returns:

  • (FileGroupDifference)

    Container for reporting the set of file-level differences of type ‘added’



301
302
303
304
305
306
307
308
309
310
# File 'lib/moab/file_group_difference.rb', line 301

def tabulate_added_files(basis_path_hash, other_path_hash)
  other_only_keys(basis_path_hash, other_path_hash).each do |path|
    fid = FileInstanceDifference.new(:change => 'added')
    fid.basis_path = ""
    fid.other_path = path
    fid.signatures << other_path_hash[path]
    subset_hash[:added].files << fid
  end
  self
end

#tabulate_deleted_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference

Returns Container for reporting the set of file-level differences of type ‘deleted’.

Parameters:

  • basis_path_hash (Hash<String,FileSignature>)

    The file paths and associated signatures for manifestations appearing only in the basis group

  • other_path_hash (Hash<String,FileSignature>)

    The file paths and associated signatures for manifestations appearing only in the other group

Returns:

  • (FileGroupDifference)

    Container for reporting the set of file-level differences of type ‘deleted’



319
320
321
322
323
324
325
326
327
328
# File 'lib/moab/file_group_difference.rb', line 319

def tabulate_deleted_files(basis_path_hash, other_path_hash)
  basis_only_keys(basis_path_hash, other_path_hash).each do |path|
    fid = FileInstanceDifference.new(:change => 'deleted')
    fid.basis_path = path
    fid.other_path = ""
    fid.signatures << basis_path_hash[path]
    subset_hash[:deleted].files << fid
  end
  self
end

#tabulate_modified_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference

Returns Container for reporting the set of file-level differences of type ‘modified’.

Parameters:

  • basis_path_hash (Hash<String,FileSignature>)

    The file paths and associated signatures for manifestations appearing only in the basis group

  • other_path_hash (Hash<String,FileSignature>)

    The file paths and associated signatures for manifestations appearing only in the other group

Returns:

  • (FileGroupDifference)

    Container for reporting the set of file-level differences of type ‘modified’



282
283
284
285
286
287
288
289
290
291
292
# File 'lib/moab/file_group_difference.rb', line 282

def tabulate_modified_files(basis_path_hash, other_path_hash)
  matching_keys(basis_path_hash, other_path_hash).each do |path|
    fid = FileInstanceDifference.new(:change => 'modified')
    fid.basis_path = path
    fid.other_path = "same"
    fid.signatures << basis_path_hash[path]
    fid.signatures << other_path_hash[path]
    subset_hash[:modified].files << fid
  end
  self
end

#tabulate_renamed_files(matching_signatures, basis_signature_hash, other_signature_hash) ⇒ FileGroupDifference

Returns Container for reporting the set of file-level differences of type ‘renamed’,‘copyadded’, or ‘copydeleted’.

Parameters:

  • matching_signatures (Array<FileSignature>)

    The file signature of the file manifestations being compared

  • basis_signature_hash (Hash<FileSignature, FileManifestation>)

    Signature to file path mapping from the file group that is the basis of the comparison

  • other_signature_hash (Hash<FileSignature, FileManifestation>)

    Signature to file path mapping from the file group that is the being compared to the basis group

Returns:

  • (FileGroupDifference)

    Container for reporting the set of file-level differences of type ‘renamed’,‘copyadded’, or ‘copydeleted’



249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
# File 'lib/moab/file_group_difference.rb', line 249

def tabulate_renamed_files(matching_signatures, basis_signature_hash, other_signature_hash)
  matching_signatures.each do |signature|
    basis_paths = basis_signature_hash[signature].paths
    other_paths = other_signature_hash[signature].paths
    basis_only_paths = basis_paths - other_paths
    other_only_paths = other_paths - basis_paths
    maxsize = [basis_only_paths.size, other_only_paths.size].max
    (0..maxsize - 1).each do |n|
      fid = FileInstanceDifference.new
      fid.basis_path = basis_only_paths[n]
      fid.other_path = other_only_paths[n]
      fid.signatures << signature
      if fid.basis_path.nil?
        fid.change = 'copyadded'
        fid.basis_path = basis_paths[0]
      elsif fid.other_path.nil?
        fid.change = 'copydeleted'
      else
        fid.change = 'renamed'
      end
      subset_hash[fid.change.to_sym].files << fid
    end
  end
  self
end

#tabulate_unchanged_files(matching_signatures, basis_signature_hash, other_signature_hash) ⇒ FileGroupDifference

Returns Container for reporting the set of file-level differences of type ‘identical’.

Parameters:

  • matching_signatures (Array<FileSignature>)

    The file signature of the file manifestations being compared

  • basis_signature_hash (Hash<FileSignature, FileManifestation>)

    Signature to file path mapping from the file group that is the basis of the comparison

  • other_signature_hash (Hash<FileSignature, FileManifestation>)

    Signature to file path mapping from the file group that is the being compared to the basis group

Returns:

  • (FileGroupDifference)

    Container for reporting the set of file-level differences of type ‘identical’



225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
# File 'lib/moab/file_group_difference.rb', line 225

def tabulate_unchanged_files(matching_signatures, basis_signature_hash, other_signature_hash)
  matching_signatures.each do |signature|
    basis_paths = basis_signature_hash[signature].paths
    other_paths = other_signature_hash[signature].paths
    matching_paths = basis_paths & other_paths
    matching_paths.each do |path|
      fid = FileInstanceDifference.new(:change => 'identical')
      fid.basis_path = path
      fid.other_path = "same"
      fid.signatures << signature
      subset_hash[:identical].files << fid
    end
  end
  self
end