Class: Moab::FileInventory

Inherits:
Serializer::Manifest show all
Includes:
HappyMapper
Defined in:
lib/moab/file_inventory.rb

Overview

Note:

Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University. All rights reserved. See LICENSE for details.

A structured container for recording information about a collection of related files.

The scope of the file collection depends on inventory type:

  • version = full set of data files comprising a digital object’s version

  • additions = subset of data files that were newly added in the specified version

  • manifests = the fixity data for manifest files in the version’s root folder

  • directory = set of files that were harvested from a filesystem directory

The inventory contains one or more FileGroup subsets, which are most commonly used to provide segregation of digital object version’s content and metadata files. Each group contains one or more FileManifestation entities, each of which represents a point-in-time snapshot of a given file’s filesystem characteristics. The fixity data for a file is stored in a FileSignature element, while the filename and modification data are stored in one or more FileInstance elements. (Copies of a given file may be present in multiple locations in a collection)

Data Model

  • FileInventory = container for recording information about a collection of related files

    • FileGroup [1..*] = subset allow segregation of content and metadata files

      • FileManifestation [1..*] = snapshot of a file’s filesystem characteristics

        • FileSignature [1] = file fixity information

        • FileInstance [1..*] = filepath and timestamp of any physical file having that signature

Examples:

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Serializer::Manifest

read_xml_file, write_xml_file, xml_pathname, xml_pathname_exist?

Methods inherited from Serializer::Serializable

#array_to_hash, deep_diff, #diff, #key, #key_name, #summary, #to_hash, #to_json, #to_yaml, #variable_names, #variables

Constructor Details

#initialize(opts = {}) ⇒ FileInventory

Returns a new instance of FileInventory.



35
36
37
38
39
# File 'lib/moab/file_inventory.rb', line 35

def initialize(opts = {})
  @groups = []
  @inventory_datetime = Time.now
  super(opts)
end

Instance Attribute Details

#block_countInteger

Returns The total disk usage (in 1 kB blocks) of all data files (estimating du -k result) (dynamically calculated).

Returns:

  • (Integer)

    The total disk usage (in 1 kB blocks) of all data files (estimating du -k result) (dynamically calculated)



88
# File 'lib/moab/file_inventory.rb', line 88

attribute :block_count, Integer, :tag => 'blockCount', :on_save => proc { |t| t.to_s }

#digital_object_idString

Returns The digital object identifier (druid).

Returns:

  • (String)

    The digital object identifier (druid)



47
# File 'lib/moab/file_inventory.rb', line 47

attribute :digital_object_id, String, :tag => 'objectId'

#file_countInteger

Returns The total number of data files in the inventory (dynamically calculated).

Returns:

  • (Integer)

    The total number of data files in the inventory (dynamically calculated)



72
# File 'lib/moab/file_inventory.rb', line 72

attribute :file_count, Integer, :tag => 'fileCount', :on_save => proc { |t| t.to_s }

#groupsArray<FileGroup>

Returns The set of data groups comprising the version.

Returns:

  • (Array<FileGroup>)

    The set of data groups comprising the version



96
# File 'lib/moab/file_inventory.rb', line 96

has_many :groups, FileGroup, :tag => 'fileGroup'

#inventory_datetimeString

Returns The datetime at which the inventory was created.

Returns:

  • (String)

    The datetime at which the inventory was created



60
# File 'lib/moab/file_inventory.rb', line 60

attribute :inventory_datetime, String, :tag => 'inventoryDatetime'

#typeString

Returns The type of inventory (version|additions|manifests|directory).

Returns:

  • (String)

    The type of inventory (version|additions|manifests|directory)



43
# File 'lib/moab/file_inventory.rb', line 43

attribute :type, String

#version_idInteger

Returns The ordinal version number.

Returns:

  • (Integer)

    The ordinal version number



51
# File 'lib/moab/file_inventory.rb', line 51

attribute :version_id, Integer, :tag => 'versionId', :key => true, :on_save => proc { |n| n.to_s }

Class Method Details

.xml_filename(type = nil) ⇒ String

Returns The standard name for the serialized inventory file of the given type.

Parameters:

  • type (String) (defaults to: nil)

    Specifies the type of inventory, and thus the filename used for storage

Returns:

  • (String)

    The standard name for the serialized inventory file of the given type



242
243
244
245
246
247
248
249
250
251
252
253
254
255
# File 'lib/moab/file_inventory.rb', line 242

def self.xml_filename(type = nil)
  case type
  when "version"
    'versionInventory.xml'
  when "additions"
    'versionAdditions.xml'
  when "manifests"
    'manifestInventory.xml'
  when "directory"
    'directoryInventory.xml'
  else
    raise ArgumentError, "unknown inventory type: #{type}"
  end
end

Instance Method Details

#byte_countInteger

Returns The total size (in bytes) in all files of all files in the inventory (dynamically calculated).

Returns:

  • (Integer)

    The total size (in bytes) in all files of all files in the inventory (dynamically calculated)



80
# File 'lib/moab/file_inventory.rb', line 80

attribute :byte_count, Integer, :tag => 'byteCount', :on_save => proc { |t| t.to_s }

#composite_keyString

Returns The unique identifier concatenating digital object id with version id.

Returns:

  • (String)

    The unique identifier concatenating digital object id with version id



54
55
56
# File 'lib/moab/file_inventory.rb', line 54

def composite_key
  digital_object_id + '-' + StorageObject.version_dirname(version_id)
end

#copy_ids(other) ⇒ void

This method returns an undefined value.

Returns Copy objectId and versionId values from another class instance into this instance.

Parameters:

  • other (FileInventory)

    another instance of this class from which to clone identity values



144
145
146
147
148
# File 'lib/moab/file_inventory.rb', line 144

def copy_ids(other)
  @digital_object_id = other.digital_object_id
  @version_id = other.version_id
  @inventory_datetime = other.inventory_datetime
end

#data_sourceString

Returns either the version ID (if inventory is a version manifest) or the name of the directory that was harvested to create the inventory

Returns:

  • (String)

    Returns either the version ID (if inventory is a version manifest) or the name of the directory that was harvested to create the inventory



159
160
161
162
163
164
165
166
# File 'lib/moab/file_inventory.rb', line 159

def data_source
  data_source = (groups.collect { |g| g.data_source.to_s }).join('|')
  if data_source.start_with?('contentMetadata')
    version_id ? "v#{version_id}-#{data_source}" : "new-#{data_source}"
  else
    version_id ? "v#{version_id}" : data_source
  end
end

#file_signature(group_id, file_id) ⇒ FileSignature

Returns The signature of the specified file.

Parameters:

  • group_id (String)

    The identifer of the group to be selected

  • file_id (String)

    The group-relative path of the file (relative to the appropriate home directory)

Returns:

Raises:



131
132
133
134
135
136
137
138
139
# File 'lib/moab/file_inventory.rb', line 131

def file_signature(group_id, file_id)
  file_group = group(group_id)
  errmsg = "group #{group_id} not found for #{digital_object_id} - #{version_id}"
  raise FileNotFoundException, errmsg if file_group.nil?
  file_signature = file_group.path_hash[file_id]
  errmsg = "#{group_id} file #{file_id} not found for #{digital_object_id} - #{version_id}"
  raise FileNotFoundException, errmsg if file_signature.nil?
  file_signature
end

#group(group_id) ⇒ FileGroup

Returns The file group in this inventory for the specified group_id.

Parameters:

  • group_id (String)

    The identifer of the group to be selected

Returns:

  • (FileGroup)

    The file group in this inventory for the specified group_id



112
113
114
# File 'lib/moab/file_inventory.rb', line 112

def group(group_id)
  groups.find { |group| group.group_id == group_id }
end

#group_empty?(group_id) ⇒ Boolean

Returns true if the group is missing or empty.

Parameters:

  • group_id (String)

    File group identifer (e.g. data, metadata, manifests)

Returns:

  • (Boolean)

    true if the group is missing or empty



118
119
120
121
# File 'lib/moab/file_inventory.rb', line 118

def group_empty?(group_id)
  group = self.group(group_id)
  group.nil? || group.files.empty?
end

#group_ids(non_empty = nil) ⇒ Array<String>

Returns group identifiers contained in this file inventory.

Parameters:

  • non_empty (Boolean) (defaults to: nil)

    if true, return group_id’s only for groups having files

Returns:

  • (Array<String>)

    group identifiers contained in this file inventory



105
106
107
108
# File 'lib/moab/file_inventory.rb', line 105

def group_ids(non_empty = nil)
  my_groups = non_empty ? non_empty_groups : groups
  my_groups.map(&:group_id)
end

#human_sizeString

Returns The total size of the inventory expressed in KB, MB, GB or TB, depending on the magnitutde of the value.

Returns:

  • (String)

    The total size of the inventory expressed in KB, MB, GB or TB, depending on the magnitutde of the value



225
226
227
228
229
230
231
232
233
234
235
236
237
# File 'lib/moab/file_inventory.rb', line 225

def human_size
  count = 0
  size = byte_count
  while (size >= 1024) && (count < 4)
    size /= 1024.0
    count += 1
  end
  if count == 0
    format("%d B", size)
  else
    format("%.2f %s", size, %w[B KB MB GB TB][count])
  end
end

#inventory_from_bagit_bag(bag_dir) ⇒ FileInventory

Returns Traverse a BagIt bag’s payload and return an inventory of the files it contains (using fixity from bag manifest files).

Parameters:

  • bag_dir (Pathname, String)

    The location of the BagIt bag to be inventoried

Returns:

  • (FileInventory)

    Traverse a BagIt bag’s payload and return an inventory of the files it contains (using fixity from bag manifest files)



188
189
190
191
192
193
194
195
196
# File 'lib/moab/file_inventory.rb', line 188

def inventory_from_bagit_bag(bag_dir)
  bag_pathname = Pathname(bag_dir)
  signatures_from_bag = signatures_from_bagit_manifests(bag_pathname)
  bag_data_subdirs = bag_pathname.join('data').children
  bag_data_subdirs.each do |subdir|
    groups << FileGroup.new(:group_id => subdir.basename.to_s).group_from_bagit_subdir(subdir, signatures_from_bag)
  end
  self
end

#inventory_from_directory(data_dir, group_id = nil) ⇒ FileInventory

Returns Traverse a directory and return an inventory of the files it contains.

Examples:

Parameters:

  • data_dir (Pathname, String)

    The location of files to be inventoried

  • group_id (String) (defaults to: nil)

    if specified, is used to set the group ID of the FileGroup created from the directory if nil, then the directory is assumed to contain both content and metadata subdirectories

Returns:

  • (FileInventory)

    Traverse a directory and return an inventory of the files it contains



174
175
176
177
178
179
180
181
182
183
# File 'lib/moab/file_inventory.rb', line 174

def inventory_from_directory(data_dir, group_id = nil)
  if group_id
    groups << FileGroup.new(group_id: group_id).group_from_directory(data_dir)
  else
    %w[content metadata].each do |gid|
      groups << FileGroup.new(group_id: gid).group_from_directory(Pathname(data_dir).join(gid))
    end
  end
  self
end

#non_empty_groupsArray<FileGroup] The set of data groups that contain files

Returns Array<FileGroup] The set of data groups that contain files.

Returns:

  • (Array<FileGroup] The set of data groups that contain files)

    Array<FileGroup] The set of data groups that contain files



99
100
101
# File 'lib/moab/file_inventory.rb', line 99

def non_empty_groups
  groups.reject { |group| group.files.empty? }
end

#package_idString

Returns Concatenation of the objectId and versionId values.

Returns:

  • (String)

    Concatenation of the objectId and versionId values



152
153
154
# File 'lib/moab/file_inventory.rb', line 152

def package_id
  "#{digital_object_id}-v#{version_id}"
end

#signatures_from_bagit_manifests(bag_pathname) ⇒ Hash<Pathname,FileSignature>

Returns The fixity data present in the bag’s manifest files.

Parameters:

  • bag_pathname (Pathname)

    The location of the BagIt bag to be inventoried

Returns:

  • (Hash<Pathname,FileSignature>)

    The fixity data present in the bag’s manifest files



200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
# File 'lib/moab/file_inventory.rb', line 200

def signatures_from_bagit_manifests(bag_pathname)
  manifest_pathname = {}
  DEFAULT_CHECKSUM_TYPES.each do |type|
    manifest_pathname[type] = bag_pathname.join("manifest-#{type}.txt")
  end
  signatures = Hash.new { |hash, path| hash[path] = FileSignature.new }
  DEFAULT_CHECKSUM_TYPES.each do |type|
    if manifest_pathname[type].exist?
      manifest_pathname[type].each_line do |line|
        line.chomp!
        checksum, data_path = line.split(/\s+\**/, 2)
        if checksum && data_path
          file_pathname = bag_pathname.join(data_path)
          signature = signatures[file_pathname]
          signature.set_checksum(type, checksum)
        end
      end
    end
  end
  signatures.each { |file_pathname, signature| signature.size = file_pathname.size }
  signatures
end

#summary_fieldsArray<String>

Returns The data fields to include in summary reports.

Returns:

  • (Array<String>)

    The data fields to include in summary reports



124
125
126
# File 'lib/moab/file_inventory.rb', line 124

def summary_fields
  %w[type digital_object_id version_id inventory_datetime file_count byte_count block_count groups]
end

#write_xml_file(parent_dir, type = nil) ⇒ void

This method returns an undefined value.

Returns write the Moab::FileInventory instance to a file.

Examples:

Parameters:

  • parent_dir (Pathname, String)

    The parent directory in which the xml file is to be stored

  • type (String) (defaults to: nil)

    The inventory type, which governs the filename used for serialization



262
263
264
265
# File 'lib/moab/file_inventory.rb', line 262

def write_xml_file(parent_dir, type = nil)
  type = @type if type.nil?
  self.class.write_xml_file(self, parent_dir, type)
end