Class: Moab::FileSignature

Inherits:
Serializer::Serializable show all
Includes:
HappyMapper
Defined in:
lib/moab/file_signature.rb

Overview

Note:

Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University. All rights reserved. See LICENSE for details.

The fixity properties of a file, used to determine file content equivalence regardless of filename. Placing this data in a class by itself facilitates using file size together with the MD5 and SHA1 checksums as a single key when doing comparisons against other file instances. The Moab design assumes that this file signature is sufficiently unique to act as a comparator for determining file equality and eliminating file redundancy.

The use of signatures for a compare-by-hash mechanism introduces a miniscule (but non-zero) risk that two non-identical files will have the same checksum. While this risk is only about 1 in 1048 when using the SHA1 checksum alone, it can be reduced even further (to about 1 in 1086) if we use the MD5 and SHA1 checksums together. And we gain a bit more comfort by including a comparison of file sizes.

Finally, the “collision” risk is reduced by isolation of each digital object’s file pool within an object folder, instead of in a common storage area shared by the whole repository.

Data Model

Constant Summary collapse

KNOWN_ALGOS =
{
  md5: proc { Digest::MD5.new },
  sha1: proc { Digest::SHA1.new },
  sha256: proc { Digest::SHA2.new(256) }
}.freeze

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Serializer::Serializable

#array_to_hash, deep_diff, #diff, #initialize, #key, #key_name, #summary, #to_hash, #to_json, #to_yaml, #variable_names, #variables

Constructor Details

This class inherits a constructor from Serializer::Serializable

Instance Attribute Details

#md5String

Returns The MD5 checksum value of the file.

Returns:

  • (String)

    The MD5 checksum value of the file



53
# File 'lib/moab/file_signature.rb', line 53

attribute :md5, String, :on_save => Proc.new { |n| n.nil? ? "" : n.to_s }

#sha1String

Returns The SHA1 checksum value of the file.

Returns:

  • (String)

    The SHA1 checksum value of the file



57
# File 'lib/moab/file_signature.rb', line 57

attribute :sha1, String, :on_save => Proc.new { |n| n.nil? ? "" : n.to_s }

#sha256String

Returns The SHA256 checksum value of the file.

Returns:

  • (String)

    The SHA256 checksum value of the file



61
# File 'lib/moab/file_signature.rb', line 61

attribute :sha256, String, :on_save => Proc.new { |n| n.nil? ? "" : n.to_s }

#sizeInteger

Returns The size of the file in bytes.

Returns:

  • (Integer)

    The size of the file in bytes



49
# File 'lib/moab/file_signature.rb', line 49

attribute :size, Integer, :on_save => Proc.new { |n| n.to_s }

Class Method Details

.active_algosObject



69
70
71
# File 'lib/moab/file_signature.rb', line 69

def self.active_algos
  Moab::Config.checksum_algos
end

.checksum_names_for_typeHash<Symbol,String>

Returns Key is type (e.g. :sha1), value is checksum names (e.g. [‘SHA-1’, ‘SHA1’]).

Returns:

  • (Hash<Symbol,String>)

    Key is type (e.g. :sha1), value is checksum names (e.g. [‘SHA-1’, ‘SHA1’])



194
195
196
197
198
199
200
# File 'lib/moab/file_signature.rb', line 194

def FileSignature.checksum_names_for_type
  names_for_type = Hash.new
  names_for_type[:md5] = ['MD5']
  names_for_type[:sha1] = ['SHA-1', 'SHA1']
  names_for_type[:sha256] = ['SHA-256', 'SHA256']
  names_for_type
end

.checksum_type_for_nameHash<String, Symbol>

Returns Key is checksum name (e.g. MD5), value is checksum type (e.g. :md5).

Returns:

  • (Hash<String, Symbol>)

    Key is checksum name (e.g. MD5), value is checksum type (e.g. :md5)



203
204
205
206
207
208
209
210
211
# File 'lib/moab/file_signature.rb', line 203

def FileSignature.checksum_type_for_name
  type_for_name = Hash.new
  self.checksum_names_for_type.each do |type, names|
    names.each do |name|
      type_for_name[name] = type
    end
  end
  type_for_name
end

.from_file(pathname, algos_to_use = active_algos) ⇒ Moab::FileSignature

Reads the file once for ALL (requested) algorithms, not once per.

Parameters:

  • pathname (Pathname)
  • one (Array<Symbol>)

    or more keys of KNOWN_ALGOS to be computed

Returns:



77
78
79
80
81
82
83
84
85
86
87
88
89
# File 'lib/moab/file_signature.rb', line 77

def self.from_file(pathname, algos_to_use = active_algos)
  raise 'Unrecognized algorithm requested' unless algos_to_use.all? { |a| KNOWN_ALGOS.include?(a) }

  signatures = algos_to_use.map { |k| [k, KNOWN_ALGOS[k].call] }.to_h

  pathname.open("r") do |stream|
    while (buffer = stream.read(8192))
      signatures.each_value { |digest| digest.update(buffer) }
    end
  end

  new(signatures.map { |k, digest| [k, digest.hexdigest] }.to_h.merge(size: pathname.size))
end

Instance Method Details

#==(other) ⇒ Object

(see #eql?)



149
150
151
# File 'lib/moab/file_signature.rb', line 149

def ==(other)
  eql?(other)
end

#checksumsHash<Symbol,String>

Returns A hash of the checksum data.

Returns:

  • (Hash<Symbol,String>)

    A hash of the checksum data



108
109
110
111
112
113
114
115
# File 'lib/moab/file_signature.rb', line 108

def checksums
  checksum_hash = Hash.new
  checksum_hash[:md5] = @md5
  checksum_hash[:sha1] = @sha1
  checksum_hash[:sha256] = @sha256
  checksum_hash.delete_if { |_key, value| value.nil? or value.empty? }
  checksum_hash
end

#complete?Boolean

Returns The signature contains all of the 3 desired checksums.

Returns:

  • (Boolean)

    The signature contains all of the 3 desired checksums



118
119
120
# File 'lib/moab/file_signature.rb', line 118

def complete?
  checksums.size == 3
end

#eql?(other) ⇒ Boolean

Returns true if self and other have comparable fixity data.

Parameters:

  • other (FileSignature)

    The other file signature being compared to this signature

Returns:

  • (Boolean)

    Returns true if self and other have comparable fixity data.



134
135
136
137
138
139
140
141
142
143
144
145
# File 'lib/moab/file_signature.rb', line 134

def eql?(other)
  return false unless (other.respond_to?(:size) && other.respond_to?(:checksums))
  return false if self.size.to_i != other.size.to_i
  self_checksums = self.checksums
  other_checksums = other.checksums
  matching_keys = self_checksums.keys & other_checksums.keys
  return false if matching_keys.size == 0
  matching_keys.each do |key|
    return false if self_checksums[key] != other_checksums[key]
  end
  true
end

#fixityHash<Symbol,String>

Returns A hash of fixity data from this signataure object.

Returns:

  • (Hash<Symbol,String>)

    A hash of fixity data from this signataure object



124
125
126
127
128
129
# File 'lib/moab/file_signature.rb', line 124

def fixity
  fixity_hash = Hash.new
  fixity_hash[:size] = @size.to_s
  fixity_hash.merge!(checksums)
  fixity_hash
end

#hashFixnum

Note:

The hash and eql? methods override the methods inherited from Object. These methods ensure that instances of this class can be used as Hash keys. See

Also overriden is #== so that equality tests in other contexts will also return the expected result.

Returns Compute a hash-code for the fixity value array. Two file instances with the same content will have the same hash code (and will compare using eql?).

Returns:

  • (Fixnum)

    Compute a hash-code for the fixity value array. Two file instances with the same content will have the same hash code (and will compare using eql?).



161
162
163
# File 'lib/moab/file_signature.rb', line 161

def hash
  @size.to_i
end

#normalized_signature(pathname) ⇒ FileSignature

Returns The full signature derived from the file, unless the fixity is inconsistent with current values.

Parameters:

  • pathname (Pathname)

    The location of the file whose full signature will be returned

Returns:

  • (FileSignature)

    The full signature derived from the file, unless the fixity is inconsistent with current values



182
183
184
185
186
187
188
189
190
191
# File 'lib/moab/file_signature.rb', line 182

def normalized_signature(pathname)
  sig_from_file = FileSignature.new.signature_from_file(pathname)
  if self.eql?(sig_from_file)
    # The full signature from file is consistent with current values
    return sig_from_file
  else
    # One or more of the fixity values is inconsistent, so raise an exception
    raise "Signature inconsistent between inventory and file for #{pathname}: #{self.diff(sig_from_file).inspect}"
  end
end

#set_checksum(type, value) ⇒ void

This method returns an undefined value.

Returns Set the value of the specified checksum type.

Parameters:

  • type (Symbol, String)

    The type of checksum

  • value (String)

    The checksum value



94
95
96
97
98
99
100
101
102
103
104
105
# File 'lib/moab/file_signature.rb', line 94

def set_checksum(type, value)
  case type.to_s.downcase.to_sym
  when :md5
    @md5 = value
  when :sha1
    @sha1 = value
  when :sha256
    @sha256 = value
  else
    raise ArgumentError, "Unknown checksum type '#{type}'"
  end
end

#signature_from_file(pathname) ⇒ FileSignature

Deprecated.

this method is a holdover from an earlier version. use the class method .from_file going forward.

Parameters:

  • pathname (Pathname)

    The location of the file to be digested

Returns:

  • (FileSignature)

    Generate a FileSignature instance containing size and checksums for a physical file



170
171
172
173
174
175
176
177
# File 'lib/moab/file_signature.rb', line 170

def signature_from_file(pathname)
  file_signature = self.class.from_file(pathname)
  self.size = file_signature.size
  self.md5 = file_signature.md5
  self.sha1 = file_signature.sha1
  self.sha256 = file_signature.sha256
  self
end