Class: Moab::SignatureCatalog
- Inherits:
-
Serializer::Manifest
- Object
- Serializer::Serializable
- Serializer::Manifest
- Moab::SignatureCatalog
- Includes:
- HappyMapper
- Defined in:
- lib/moab/signature_catalog.rb
Overview
Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University. All rights reserved. See LICENSE for details.
A digital object’s Signature Catalog is derived from an filtered aggregation of the file inventories of a digital object’s set of versions. (see #update) It has an entry for every file (identified by FileSignature) found in any of the versions, along with a record of the SDR storage location that was used to preserve a single file instance. Once this catalog has been populated, it has multiple uses:
-
The signature index is used to determine which files of a newly submitted object version are new additions and which are duplicates of files previously ingested. (See #version_additions) (When a new version contains a mixture of added files and files carried over from the previous version we only need to store the files from the new version that have unique file signatures.)
-
Reconstruction of an object version (see Moab::StorageObject#reconstruct_version) requires a combination of a full version’s FileInventory and the SignatureCatalog.
-
The catalog can also be used for performing consistency checks between manifest files and storage
Data Model
-
SignatureCatalog = lookup table containing a cumulative collection of all files ever ingested
-
SignatureCatalogEntry [1..*] = an row in the lookup table containing storage information about a single file
-
FileSignature [1] = file fixity information
-
-
Instance Attribute Summary collapse
-
#block_count ⇒ Integer
The total disk usage (in 1 kB blocks) of all data files (estimating du -k result) (dynamically calculated).
-
#byte_count ⇒ Integer
The total size (in bytes) of all data files (dynamically calculated).
-
#catalog_datetime ⇒ String
The datetime at which the catalog was updated.
-
#digital_object_id ⇒ String
The object ID (druid).
-
#entries ⇒ Array<SignatureCatalogEntry>
The set of data groups comprising the version.
-
#file_count ⇒ Integer
The total number of data files (dynamically calculated).
-
#signature_hash ⇒ Hash
An index having FileSignature objects as keys and SignatureCatalogEntry objects as values.
-
#version_id ⇒ Integer
The ordinal version number.
Instance Method Summary collapse
-
#add_entry(entry) ⇒ void
Add a new entry to the catalog and to the #signature_hash index.
-
#catalog_filepath(file_signature) ⇒ String
The object-relative path of the file having the specified signature.
-
#composite_key ⇒ String
The unique identifier concatenating digital object id with version id.
-
#initialize(opts = {}) ⇒ SignatureCatalog
constructor
A new instance of SignatureCatalog.
-
#normalize_group_signatures(group, group_pathname = nil) ⇒ void
Inspect and upgrade the group’s signature data to include all desired checksums.
-
#summary_fields ⇒ Array<String>
The data fields to include in summary reports.
-
#update(version_inventory, data_pathname) ⇒ void
Compares the FileSignature entries in the new versions FileInventory against the signatures in this catalog and create new SignatureCatalogEntry addtions to the catalog.
-
#version_additions(version_inventory) ⇒ FileInventory
Retrurns a filtered copy of the input inventory containing only those files that were added in this version.
Methods inherited from Serializer::Manifest
read_xml_file, write_xml_file, #write_xml_file, xml_filename, xml_pathname, xml_pathname_exist?
Methods inherited from Serializer::Serializable
#array_to_hash, deep_diff, #diff, #key, #key_name, #summary, #to_hash, #to_json, #to_yaml, #variable_names, #variables
Constructor Details
#initialize(opts = {}) ⇒ SignatureCatalog
Returns a new instance of SignatureCatalog.
34 35 36 37 38 |
# File 'lib/moab/signature_catalog.rb', line 34 def initialize(opts = {}) @entries = [] @signature_hash = {} super(opts) end |
Instance Attribute Details
#block_count ⇒ Integer
Returns The total disk usage (in 1 kB blocks) of all data files (estimating du -k result) (dynamically calculated).
83 |
# File 'lib/moab/signature_catalog.rb', line 83 attribute :block_count, Integer, :tag => 'blockCount', :on_save => proc { |t| t.to_s } |
#byte_count ⇒ Integer
Returns The total size (in bytes) of all data files (dynamically calculated).
75 |
# File 'lib/moab/signature_catalog.rb', line 75 attribute :byte_count, Integer, :tag => 'byteCount', :on_save => proc { |t| t.to_s } |
#catalog_datetime ⇒ String
Returns The datetime at which the catalog was updated.
55 |
# File 'lib/moab/signature_catalog.rb', line 55 attribute :catalog_datetime, Time, :tag => 'catalogDatetime' |
#digital_object_id ⇒ String
Returns The object ID (druid).
42 |
# File 'lib/moab/signature_catalog.rb', line 42 attribute :digital_object_id, String, :tag => 'objectId' |
#entries ⇒ Array<SignatureCatalogEntry>
Returns The set of data groups comprising the version.
97 |
# File 'lib/moab/signature_catalog.rb', line 97 has_many :entries, SignatureCatalogEntry, :tag => 'entry' |
#file_count ⇒ Integer
Returns The total number of data files (dynamically calculated).
67 |
# File 'lib/moab/signature_catalog.rb', line 67 attribute :file_count, Integer, :tag => 'fileCount', :on_save => proc { |t| t.to_s } |
#signature_hash ⇒ Hash
Returns An index having FileSignature objects as keys and Moab::SignatureCatalogEntry objects as values.
106 107 108 |
# File 'lib/moab/signature_catalog.rb', line 106 def signature_hash @signature_hash end |
#version_id ⇒ Integer
Returns The ordinal version number.
46 |
# File 'lib/moab/signature_catalog.rb', line 46 attribute :version_id, Integer, :tag => 'versionId', :key => true, :on_save => proc { |n| n.to_s } |
Instance Method Details
#add_entry(entry) ⇒ void
This method returns an undefined value.
Returns Add a new entry to the catalog and to the #signature_hash index.
111 112 113 114 |
# File 'lib/moab/signature_catalog.rb', line 111 def add_entry(entry) @signature_hash[entry.signature] = entry entries << entry end |
#catalog_filepath(file_signature) ⇒ String
Returns The object-relative path of the file having the specified signature.
118 119 120 121 122 123 124 125 |
# File 'lib/moab/signature_catalog.rb', line 118 def catalog_filepath(file_signature) catalog_entry = @signature_hash[file_signature] if catalog_entry.nil? msg = "catalog entry not found for #{file_signature.fixity.inspect} in #{@digital_object_id} - #{@version_id}" raise FileNotFoundException, msg end catalog_entry.storage_path end |
#composite_key ⇒ String
Returns The unique identifier concatenating digital object id with version id.
49 50 51 |
# File 'lib/moab/signature_catalog.rb', line 49 def composite_key @digital_object_id + '-' + StorageObject.version_dirname(@version_id) end |
#normalize_group_signatures(group, group_pathname = nil) ⇒ void
This method returns an undefined value.
Returns Inspect and upgrade the group’s signature data to include all desired checksums.
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
# File 'lib/moab/signature_catalog.rb', line 130 def normalize_group_signatures(group, group_pathname = nil) unless group_pathname.nil? group_pathname = Pathname(group_pathname) raise(MoabRuntimeError, "Could not locate #{group_pathname}") unless group_pathname.exist? end group.files.each do |file| unless file.signature.complete? if @signature_hash.key?(file.signature) file.signature = @signature_hash.find { |k, _v| k == file.signature }[0] elsif group_pathname file_pathname = group_pathname.join(file.instances[0].path) file.signature = file.signature.normalized_signature(file_pathname) end end end end |
#summary_fields ⇒ Array<String>
Returns The data fields to include in summary reports.
91 92 93 |
# File 'lib/moab/signature_catalog.rb', line 91 def summary_fields %w[digital_object_id version_id catalog_datetime file_count byte_count block_count] end |
#update(version_inventory, data_pathname) ⇒ void
This method returns an undefined value.
Returns Compares the FileSignature entries in the new versions FileInventory against the signatures in this catalog and create new Moab::SignatureCatalogEntry addtions to the catalog.
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
# File 'lib/moab/signature_catalog.rb', line 153 def update(version_inventory, data_pathname) version_inventory.groups.each do |group| group.files.each do |file| unless @signature_hash.key?(file.signature) entry = SignatureCatalogEntry.new entry.version_id = version_inventory.version_id entry.group_id = group.group_id entry.path = file.instances[0].path if file.signature.complete? entry.signature = file.signature else file_pathname = data_pathname.join(group.group_id, entry.path) entry.signature = file.signature.normalized_signature(file_pathname) end add_entry(entry) end end end @version_id = version_inventory.version_id @catalog_datetime = Time.now end |
#version_additions(version_inventory) ⇒ FileInventory
Returns Retrurns a filtered copy of the input inventory containing only those files that were added in this version.
180 181 182 183 184 185 186 187 188 189 190 191 |
# File 'lib/moab/signature_catalog.rb', line 180 def version_additions(version_inventory) version_additions = FileInventory.new(:type => 'additions') version_additions.copy_ids(version_inventory) version_inventory.groups.each do |group| group_addtions = FileGroup.new(:group_id => group.group_id) group.files.each do |file| group_addtions.add_file_instance(file.signature, file.instances[0]) unless @signature_hash.key?(file.signature) end version_additions.groups << group_addtions unless group_addtions.files.empty? end version_additions end |