Class: Moab::FileInventory
- Inherits:
-
Serializer::Manifest
- Object
- Serializer::Serializable
- Serializer::Manifest
- Moab::FileInventory
- Includes:
- HappyMapper
- Defined in:
- lib/moab/file_inventory.rb
Overview
Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University. All rights reserved. See LICENSE for details.
A structured container for recording information about a collection of related files.
The scope of the file collection depends on inventory type:
-
version = full set of data files comprising a digital object’s version
-
additions = subset of data files that were newly added in the specified version
-
manifests = the fixity data for manifest files in the version’s root folder
-
directory = set of files that were harvested from a filesystem directory
The inventory contains one or more FileGroup subsets, which are most commonly used to provide segregation of digital object version’s content and metadata files. Each group contains one or more FileManifestation entities, each of which represents a point-in-time snapshot of a given file’s filesystem characteristics. The fixity data for a file is stored in a FileSignature element, while the filename and modification data are stored in one or more FileInstance elements. (Copies of a given file may be present in multiple locations in a collection)
Data Model
-
FileInventory = container for recording information about a collection of related files
-
FileGroup [1..*] = subset allow segregation of content and metadata files
-
FileManifestation [1..*] = snapshot of a file’s filesystem characteristics
-
FileSignature [1] = file fixity information
-
FileInstance [1..*] = filepath and timestamp of any physical file having that signature
-
-
-
Instance Attribute Summary collapse
-
#block_count ⇒ Integer
The total disk usage (in 1 kB blocks) of all data files (estimating du -k result) (dynamically calculated).
-
#digital_object_id ⇒ String
The digital object identifier (druid).
-
#file_count ⇒ Integer
The total number of data files in the inventory (dynamically calculated).
-
#groups ⇒ Array<FileGroup>
The set of data groups comprising the version.
-
#inventory_datetime ⇒ String
The datetime at which the inventory was created.
-
#type ⇒ String
The type of inventory (version|additions|manifests|directory).
-
#version_id ⇒ Integer
The ordinal version number.
Class Method Summary collapse
-
.xml_filename(type = nil) ⇒ String
The standard name for the serialized inventory file of the given type.
Instance Method Summary collapse
-
#byte_count ⇒ Integer
The total size (in bytes) in all files of all files in the inventory (dynamically calculated).
-
#composite_key ⇒ String
The unique identifier concatenating digital object id with version id.
-
#copy_ids(other) ⇒ void
Copy objectId and versionId values from another class instance into this instance.
-
#data_source ⇒ String
Returns either the version ID (if inventory is a version manifest) or the name of the directory that was harvested to create the inventory.
-
#file_signature(group_id, file_id) ⇒ FileSignature
The signature of the specified file.
-
#group(group_id) ⇒ FileGroup
The file group in this inventory for the specified group_id.
-
#group_empty?(group_id) ⇒ Boolean
True if the group is missing or empty.
-
#group_ids(non_empty = nil) ⇒ Array<String>
Group identifiers contained in this file inventory.
-
#human_size ⇒ String
The total size of the inventory expressed in KB, MB, GB or TB, depending on the magnitutde of the value.
-
#initialize(opts = {}) ⇒ FileInventory
constructor
A new instance of FileInventory.
-
#inventory_from_bagit_bag(bag_dir) ⇒ FileInventory
Traverse a BagIt bag’s payload and return an inventory of the files it contains (using fixity from bag manifest files).
-
#inventory_from_directory(data_dir, group_id = nil) ⇒ FileInventory
Traverse a directory and return an inventory of the files it contains.
-
#non_empty_groups ⇒ Array<FileGroup] The set of data groups that contain files
Array<FileGroup] The set of data groups that contain files.
-
#package_id ⇒ String
Concatenation of the objectId and versionId values.
-
#signatures_from_bagit_manifests(bag_pathname) ⇒ Hash<Pathname,FileSignature>
The fixity data present in the bag’s manifest files.
-
#summary_fields ⇒ Array<String>
The data fields to include in summary reports.
-
#write_xml_file(parent_dir, type = nil) ⇒ void
Write the FileInventory instance to a file.
Methods inherited from Serializer::Manifest
read_xml_file, write_xml_file, xml_pathname, xml_pathname_exist?
Methods inherited from Serializer::Serializable
#array_to_hash, deep_diff, #diff, #key, #key_name, #summary, #to_hash, #to_json, #to_yaml, #variable_names, #variables
Constructor Details
#initialize(opts = {}) ⇒ FileInventory
Returns a new instance of FileInventory.
35 36 37 38 39 |
# File 'lib/moab/file_inventory.rb', line 35 def initialize(opts = {}) @groups = [] @inventory_datetime = Time.now super(opts) end |
Instance Attribute Details
#block_count ⇒ Integer
Returns The total disk usage (in 1 kB blocks) of all data files (estimating du -k result) (dynamically calculated).
88 |
# File 'lib/moab/file_inventory.rb', line 88 attribute :block_count, Integer, :tag => 'blockCount', :on_save => proc { |t| t.to_s } |
#digital_object_id ⇒ String
Returns The digital object identifier (druid).
47 |
# File 'lib/moab/file_inventory.rb', line 47 attribute :digital_object_id, String, :tag => 'objectId' |
#file_count ⇒ Integer
Returns The total number of data files in the inventory (dynamically calculated).
72 |
# File 'lib/moab/file_inventory.rb', line 72 attribute :file_count, Integer, :tag => 'fileCount', :on_save => proc { |t| t.to_s } |
#groups ⇒ Array<FileGroup>
Returns The set of data groups comprising the version.
96 |
# File 'lib/moab/file_inventory.rb', line 96 has_many :groups, FileGroup, :tag => 'fileGroup' |
#inventory_datetime ⇒ String
Returns The datetime at which the inventory was created.
60 |
# File 'lib/moab/file_inventory.rb', line 60 attribute :inventory_datetime, String, :tag => 'inventoryDatetime' |
#type ⇒ String
Returns The type of inventory (version|additions|manifests|directory).
43 |
# File 'lib/moab/file_inventory.rb', line 43 attribute :type, String |
#version_id ⇒ Integer
Returns The ordinal version number.
51 |
# File 'lib/moab/file_inventory.rb', line 51 attribute :version_id, Integer, :tag => 'versionId', :key => true, :on_save => proc { |n| n.to_s } |
Class Method Details
.xml_filename(type = nil) ⇒ String
Returns The standard name for the serialized inventory file of the given type.
242 243 244 245 246 247 248 249 250 251 252 253 254 255 |
# File 'lib/moab/file_inventory.rb', line 242 def self.xml_filename(type = nil) case type when "version" 'versionInventory.xml' when "additions" 'versionAdditions.xml' when "manifests" 'manifestInventory.xml' when "directory" 'directoryInventory.xml' else raise ArgumentError, "unknown inventory type: #{type}" end end |
Instance Method Details
#byte_count ⇒ Integer
Returns The total size (in bytes) in all files of all files in the inventory (dynamically calculated).
80 |
# File 'lib/moab/file_inventory.rb', line 80 attribute :byte_count, Integer, :tag => 'byteCount', :on_save => proc { |t| t.to_s } |
#composite_key ⇒ String
Returns The unique identifier concatenating digital object id with version id.
54 55 56 |
# File 'lib/moab/file_inventory.rb', line 54 def composite_key digital_object_id + '-' + StorageObject.version_dirname(version_id) end |
#copy_ids(other) ⇒ void
This method returns an undefined value.
Returns Copy objectId and versionId values from another class instance into this instance.
144 145 146 147 148 |
# File 'lib/moab/file_inventory.rb', line 144 def copy_ids(other) @digital_object_id = other.digital_object_id @version_id = other.version_id @inventory_datetime = other.inventory_datetime end |
#data_source ⇒ String
Returns either the version ID (if inventory is a version manifest) or the name of the directory that was harvested to create the inventory
159 160 161 162 163 164 165 166 |
# File 'lib/moab/file_inventory.rb', line 159 def data_source data_source = (groups.collect { |g| g.data_source.to_s }).join('|') if data_source.start_with?('contentMetadata') version_id ? "v#{version_id}-#{data_source}" : "new-#{data_source}" else version_id ? "v#{version_id}" : data_source end end |
#file_signature(group_id, file_id) ⇒ FileSignature
Returns The signature of the specified file.
131 132 133 134 135 136 137 138 139 |
# File 'lib/moab/file_inventory.rb', line 131 def file_signature(group_id, file_id) file_group = group(group_id) errmsg = "group #{group_id} not found for #{digital_object_id} - #{version_id}" raise FileNotFoundException, errmsg if file_group.nil? file_signature = file_group.path_hash[file_id] errmsg = "#{group_id} file #{file_id} not found for #{digital_object_id} - #{version_id}" raise FileNotFoundException, errmsg if file_signature.nil? file_signature end |
#group(group_id) ⇒ FileGroup
Returns The file group in this inventory for the specified group_id.
112 113 114 |
# File 'lib/moab/file_inventory.rb', line 112 def group(group_id) groups.find { |group| group.group_id == group_id } end |
#group_empty?(group_id) ⇒ Boolean
Returns true if the group is missing or empty.
118 119 120 121 |
# File 'lib/moab/file_inventory.rb', line 118 def group_empty?(group_id) group = self.group(group_id) group.nil? || group.files.empty? end |
#group_ids(non_empty = nil) ⇒ Array<String>
Returns group identifiers contained in this file inventory.
105 106 107 108 |
# File 'lib/moab/file_inventory.rb', line 105 def group_ids(non_empty = nil) my_groups = non_empty ? non_empty_groups : groups my_groups.map(&:group_id) end |
#human_size ⇒ String
Returns The total size of the inventory expressed in KB, MB, GB or TB, depending on the magnitutde of the value.
225 226 227 228 229 230 231 232 233 234 235 236 237 |
# File 'lib/moab/file_inventory.rb', line 225 def human_size count = 0 size = byte_count while (size >= 1024) && (count < 4) size /= 1024.0 count += 1 end if count == 0 format("%d B", size) else format("%.2f %s", size, %w[B KB MB GB TB][count]) end end |
#inventory_from_bagit_bag(bag_dir) ⇒ FileInventory
Returns Traverse a BagIt bag’s payload and return an inventory of the files it contains (using fixity from bag manifest files).
188 189 190 191 192 193 194 195 196 |
# File 'lib/moab/file_inventory.rb', line 188 def inventory_from_bagit_bag(bag_dir) bag_pathname = Pathname(bag_dir) signatures_from_bag = signatures_from_bagit_manifests(bag_pathname) bag_data_subdirs = bag_pathname.join('data').children bag_data_subdirs.each do |subdir| groups << FileGroup.new(:group_id => subdir.basename.to_s).group_from_bagit_subdir(subdir, signatures_from_bag) end self end |
#inventory_from_directory(data_dir, group_id = nil) ⇒ FileInventory
Returns Traverse a directory and return an inventory of the files it contains.
174 175 176 177 178 179 180 181 182 183 |
# File 'lib/moab/file_inventory.rb', line 174 def inventory_from_directory(data_dir, group_id = nil) if group_id groups << FileGroup.new(group_id: group_id).group_from_directory(data_dir) else %w[content metadata].each do |gid| groups << FileGroup.new(group_id: gid).group_from_directory(Pathname(data_dir).join(gid)) end end self end |
#non_empty_groups ⇒ Array<FileGroup] The set of data groups that contain files
Returns Array<FileGroup] The set of data groups that contain files.
99 100 101 |
# File 'lib/moab/file_inventory.rb', line 99 def non_empty_groups groups.reject { |group| group.files.empty? } end |
#package_id ⇒ String
Returns Concatenation of the objectId and versionId values.
152 153 154 |
# File 'lib/moab/file_inventory.rb', line 152 def package_id "#{digital_object_id}-v#{version_id}" end |
#signatures_from_bagit_manifests(bag_pathname) ⇒ Hash<Pathname,FileSignature>
Returns The fixity data present in the bag’s manifest files.
200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 |
# File 'lib/moab/file_inventory.rb', line 200 def signatures_from_bagit_manifests(bag_pathname) manifest_pathname = {} DEFAULT_CHECKSUM_TYPES.each do |type| manifest_pathname[type] = bag_pathname.join("manifest-#{type}.txt") end signatures = Hash.new { |hash, path| hash[path] = FileSignature.new } DEFAULT_CHECKSUM_TYPES.each do |type| if manifest_pathname[type].exist? manifest_pathname[type].each_line do |line| line.chomp! checksum, data_path = line.split(/\s+\**/, 2) if checksum && data_path file_pathname = bag_pathname.join(data_path) signature = signatures[file_pathname] signature.set_checksum(type, checksum) end end end end signatures.each { |file_pathname, signature| signature.size = file_pathname.size } signatures end |
#summary_fields ⇒ Array<String>
Returns The data fields to include in summary reports.
124 125 126 |
# File 'lib/moab/file_inventory.rb', line 124 def summary_fields %w[type digital_object_id version_id inventory_datetime file_count byte_count block_count groups] end |
#write_xml_file(parent_dir, type = nil) ⇒ void
This method returns an undefined value.
Returns write the Moab::FileInventory instance to a file.
262 263 264 265 |
# File 'lib/moab/file_inventory.rb', line 262 def write_xml_file(parent_dir, type = nil) type = @type if type.nil? self.class.write_xml_file(self, parent_dir, type) end |