Class: Stanford::ContentInventory
- Inherits:
-
Object
- Object
- Stanford::ContentInventory
- Defined in:
- lib/stanford/content_inventory.rb
Overview
Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University. All rights reserved. See LICENSE for details.
Stanford-specific utility methods for transforming contentMetadata to versionInventory and doing comparisons
Data Model
-
DorMetadata = utility methods for interfacing with Stanford metadata files (esp contentMetadata)
-
ContentInventory [1..1] = utilities for transforming contentMetadata to versionInventory and doing comparisons
-
ActiveFedoraObject [1..*] = utility for extracting content or other information from a Fedora Instance
-
Instance Method Summary collapse
-
#generate_content_metadata(file_group, object_id, version_id) ⇒ String
The contentMetadata instance generated from the FileGroup.
-
#generate_instance(node) ⇒ FileInstance
The FileInstance object generated from the XML data.
-
#generate_signature(node) ⇒ FileSignature
The FileSignature object generated from the XML data.
-
#group_from_cm(content_metadata, subset) ⇒ FileGroup
The FileGroup object generated from a contentMetadata instance.
-
#inventory_from_cm(content_metadata, object_id, subset, version_id = nil) ⇒ FileInventory
The versionInventory equivalent of the contentMetadata if the supplied content_metadata is blank or empty, then a skeletal FileInventory will be returned.
-
#remediate_checksum_nodes(file_node, signature) ⇒ void
Update the file’s checksum elements if data missing, raise exception if inconsistent.
-
#remediate_content_metadata(content_metadata, content_group) ⇒ String
Returns a remediated copy of the contentMetadata with fixity data filled in.
-
#remediate_file_size(file_node, signature) ⇒ void
Update the file size attribute if missing, raise exception if inconsistent.
-
#validate_content_metadata(content_metadata) ⇒ Boolean
True if contentMetadata has essential file attributes, else raise exception.
-
#validate_content_metadata_details(content_metadata) ⇒ Array<String>
List of problems found.
Instance Method Details
#generate_content_metadata(file_group, object_id, version_id) ⇒ String
Returns The contentMetadata instance generated from the FileGroup.
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
# File 'lib/stanford/content_inventory.rb', line 99 def (file_group, object_id, version_id) cm = Nokogiri::XML::Builder.new do |xml| xml.contentMetadata(:type => "sample", :objectId => object_id) do xml.resource(:type => "version", :sequence => "1", :id => "version-#{version_id}") do file_group.files.each do |file_manifestation| signature = file_manifestation.signature file_manifestation.instances.each do |instance| xml.file( :id => instance.path, :size => signature.size, :datetime => instance.datetime, :shelve => 'yes', :publish => 'yes', :preserve => 'yes' ) do fixity = signature.fixity xml.checksum(:type => "MD5") { xml.text signature.md5 } if fixity[:md5] xml.checksum(:type => "SHA-1") { xml.text signature.sha1 } if fixity[:sha1] xml.checksum(:type => "SHA-256") { xml.text signature.sha256 } if fixity[:sha256] end end end end end end cm.to_xml end |
#generate_instance(node) ⇒ FileInstance
Returns The FileInstance object generated from the XML data.
84 85 86 87 88 89 90 91 92 93 |
# File 'lib/stanford/content_inventory.rb', line 84 def generate_instance(node) instance = Moab::FileInstance.new instance.path = node.attributes['id'].content instance.datetime = begin node.attributes['datetime'].content rescue nil end instance end |
#generate_signature(node) ⇒ FileSignature
Returns The FileSignature object generated from the XML data.
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
# File 'lib/stanford/content_inventory.rb', line 64 def generate_signature(node) signature = Moab::FileSignature.new signature.size = node.attributes['size'].content checksum_nodes = node.xpath('checksum') checksum_nodes.each do |checksum_node| case checksum_node.attributes['type'].content.upcase when 'MD5' signature.md5 = checksum_node.text when 'SHA1', 'SHA-1' signature.sha1 = checksum_node.text when 'SHA256', 'SHA-256' signature.sha256 = checksum_node.text end end signature end |
#group_from_cm(content_metadata, subset) ⇒ FileGroup
Returns The FileGroup object generated from a contentMetadata instance.
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
# File 'lib/stanford/content_inventory.rb', line 37 def group_from_cm(, subset) ng_doc = Nokogiri::XML() (ng_doc) nodeset = case subset.to_s.downcase when 'preserve' ng_doc.xpath("//file[@preserve='yes']") when 'publish' ng_doc.xpath("//file[@publish='yes']") when 'shelve' ng_doc.xpath("//file[@shelve='yes']") when 'all' ng_doc.xpath("//file") else raise(Moab::MoabRuntimeError, "Unknown disposition subset (#{subset})") end content_group = Moab::FileGroup.new(:group_id => 'content', :data_source => "contentMetadata-#{subset}") nodeset.each do |file_node| signature = generate_signature(file_node) instance = generate_instance(file_node) content_group.add_file_instance(signature, instance) end content_group end |
#inventory_from_cm(content_metadata, object_id, subset, version_id = nil) ⇒ FileInventory
Returns The versionInventory equivalent of the contentMetadata if the supplied content_metadata is blank or empty, then a skeletal FileInventory will be returned.
20 21 22 23 24 25 26 27 28 29 30 |
# File 'lib/stanford/content_inventory.rb', line 20 def inventory_from_cm(, object_id, subset, version_id = nil) # The contentMetadata datastream is not required for ingest, since some object types, such as collection # or APO do not require one. # Many of these objects have contentMetadata with no child elements, such as this: # <contentMetadata objectId="bd608mj3166" type="file"/> # but there are also objects that have no datasteam of this name at all cm_inventory = Moab::FileInventory.new(:type => "version", :digital_object_id => object_id, :version_id => version_id) content_group = group_from_cm(, subset) cm_inventory.groups << content_group cm_inventory end |
#remediate_checksum_nodes(file_node, signature) ⇒ void
This method returns an undefined value.
Returns update the file’s checksum elements if data missing, raise exception if inconsistent.
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 |
# File 'lib/stanford/content_inventory.rb', line 212 def remediate_checksum_nodes(file_node, signature) # collect <checksum> elements for checksum types that are already present checksum_nodes = {} file_node.xpath('checksum').each do |checksum_node| type = @type_for_name[checksum_node['type']] checksum_nodes[type] = checksum_node end # add new <checksum> elements for the other checksum types that were missing @names_for_type.each do |type, names| unless checksum_nodes.key?(type) checksum_node = Nokogiri::XML::Element.new('checksum', file_node.document) checksum_node['type'] = names[0] file_node << checksum_node checksum_nodes[type] = checksum_node end end # make sure the <checksum> element has a content value checksum_nodes.each do |type, checksum_node| cm_checksum = checksum_node.content sig_checksum = signature.checksums[type] if cm_checksum.nil? || cm_checksum.empty? checksum_node.content = sig_checksum elsif cm_checksum != sig_checksum raise(Moab::MoabRuntimeError, "Inconsistent #{type} for #{file_node['id']}: #{cm_checksum} != #{sig_checksum}") end end end |
#remediate_content_metadata(content_metadata, content_group) ⇒ String
Returns a remediated copy of the contentMetadata with fixity data filled in
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
# File 'lib/stanford/content_inventory.rb', line 179 def (, content_group) return nil if .nil? return if content_group.nil? || content_group.files.empty? signature_for_path = content_group.path_hash @type_for_name = Moab::FileSignature.checksum_type_for_name @names_for_type = Moab::FileSignature.checksum_names_for_type ng_doc = Nokogiri::XML(, &:noblanks) nodeset = ng_doc.xpath("//file") nodeset.each do |file_node| filepath = file_node['id'] signature = signature_for_path[filepath] remediate_file_size(file_node, signature) remediate_checksum_nodes(file_node, signature) end ng_doc.to_xml(:indent => 2) end |
#remediate_file_size(file_node, signature) ⇒ void
This method returns an undefined value.
Returns update the file size attribute if missing, raise exception if inconsistent.
200 201 202 203 204 205 206 207 |
# File 'lib/stanford/content_inventory.rb', line 200 def remediate_file_size(file_node, signature) file_size = file_node['size'] if file_size.nil? || file_size.empty? file_node['size'] = signature.size.to_s elsif file_size != signature.size.to_s raise(Moab::MoabRuntimeError, "Inconsistent size for #{file_node['id']}: #{file_size} != #{signature.size}") end end |
#validate_content_metadata(content_metadata) ⇒ Boolean
Returns True if contentMetadata has essential file attributes, else raise exception.
129 130 131 132 133 134 |
# File 'lib/stanford/content_inventory.rb', line 129 def () result = () raise Moab::InvalidMetadataException, result[0] + " ..." unless result.empty? true end |
#validate_content_metadata_details(content_metadata) ⇒ Array<String>
Returns List of problems found.
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
# File 'lib/stanford/content_inventory.rb', line 138 def () result = [] = case .class.name when "String" Nokogiri::XML() when "Pathname" Nokogiri::XML(.read) when "Nokogiri::XML::Document" else raise Moab::InvalidMetadataException, "Content Metadata is in unrecognized format" end nodeset = .xpath("//file") nodeset.each do |file_node| missing = %w[id size md5 sha1] missing.delete('id') if file_node.has_attribute?('id') missing.delete('size') if file_node.has_attribute?('size') checksum_nodes = file_node.xpath('checksum') checksum_nodes.each do |checksum_node| case checksum_node.attributes['type'].content.upcase when 'MD5' missing.delete('md5') when 'SHA1', 'SHA-1' missing.delete('sha1') end end if missing.include?('id') result << "File node #{nodeset.index(file_node)} is missing #{missing.join(',')}" elsif !missing.empty? id = file_node['id'] result << "File node having id='#{id}' is missing #{missing.join(',')}" end end result end |