Class: Moab::FileInventory

Inherits:
Serializer::Manifest show all
Includes:
HappyMapper
Defined in:
lib/moab/file_inventory.rb

Overview

Note:

Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University. All rights reserved. See LICENSE for details.

A structured container for recording information about a collection of related files.

The scope of the file collection depends on inventory type:

  • version = full set of data files comprising a digital object’s version

  • additions = subset of data files that were newly added in the specified version

  • manifests = the fixity data for manifest files in the version’s root folder

  • directory = set of files that were harvested from a filesystem directory

The inventory contains one or more FileGroup subsets, which are most commonly used to provide segregation of digital object version’s content and metadata files. Each group contains one or more FileManifestation entities, each of which represents a point-in-time snapshot of a given file’s filesystem characteristics. The fixity data for a file is stored in a FileSignature element, while the filename and modification data are stored in one or more FileInstance elements. (Copies of a given file may be present in multiple locations in a collection)

Data Model

  • FileInventory = container for recording information about a collection of related files

    • FileGroup [1..*] = subset allow segregation of content and metadata files

      • FileManifestation [1..*] = snapshot of a file’s filesystem characteristics

        • FileSignature [1] = file fixity information

        • FileInstance [1..*] = filepath and timestamp of any physical file having that signature

Examples:

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Serializer::Manifest

read_xml_file, write_xml_file, xml_pathname, xml_pathname_exist?

Methods inherited from Serializer::Serializable

#array_to_hash, deep_diff, #diff, #key, #key_name, #summary, #to_hash, #to_json, #to_yaml, #variable_names, #variables

Constructor Details

#initialize(opts = {}) ⇒ FileInventory

Returns a new instance of FileInventory.



39
40
41
42
43
# File 'lib/moab/file_inventory.rb', line 39

def initialize(opts={})
  @groups = Array.new
  @inventory_datetime = Time.now
  super(opts)
end

Instance Attribute Details

#block_countInteger

Returns The total disk usage (in 1 kB blocks) of all data files (estimating du -k result) (dynamically calculated).

Returns:

  • (Integer)

    The total disk usage (in 1 kB blocks) of all data files (estimating du -k result) (dynamically calculated)



92
# File 'lib/moab/file_inventory.rb', line 92

attribute :block_count, Integer, :tag => 'blockCount', :on_save => Proc.new {|t| t.to_s}

#digital_object_idString

Returns The digital object identifier (druid).

Returns:

  • (String)

    The digital object identifier (druid)



51
# File 'lib/moab/file_inventory.rb', line 51

attribute :digital_object_id, String, :tag => 'objectId'

#file_countInteger

Returns The total number of data files in the inventory (dynamically calculated).

Returns:

  • (Integer)

    The total number of data files in the inventory (dynamically calculated)



76
# File 'lib/moab/file_inventory.rb', line 76

attribute :file_count, Integer, :tag => 'fileCount', :on_save => Proc.new {|t| t.to_s}

#groupsArray<FileGroup>

Returns The set of data groups comprising the version.

Returns:

  • (Array<FileGroup>)

    The set of data groups comprising the version



100
# File 'lib/moab/file_inventory.rb', line 100

has_many :groups, FileGroup, :tag => 'fileGroup'

#inventory_datetimeString

Returns The datetime at which the inventory was created.

Returns:

  • (String)

    The datetime at which the inventory was created



64
# File 'lib/moab/file_inventory.rb', line 64

attribute :inventory_datetime, String, :tag => 'inventoryDatetime'

#typeString

Returns The type of inventory (version|additions|manifests|directory).

Returns:

  • (String)

    The type of inventory (version|additions|manifests|directory)



47
# File 'lib/moab/file_inventory.rb', line 47

attribute :type, String

#version_idInteger

Returns The ordinal version number.

Returns:

  • (Integer)

    The ordinal version number



55
# File 'lib/moab/file_inventory.rb', line 55

attribute :version_id, Integer, :tag => 'versionId', :key => true, :on_save => Proc.new {|n| n.to_s}

Class Method Details

.xml_filename(type = nil) ⇒ String

Returns The standard name for the serialized inventory file of the given type.

Parameters:

  • type (String) (defaults to: nil)

    Specifies the type of inventory, and thus the filename used for storage

Returns:

  • (String)

    The standard name for the serialized inventory file of the given type



253
254
255
256
257
258
259
260
261
262
263
264
265
266
# File 'lib/moab/file_inventory.rb', line 253

def self.xml_filename(type=nil)
  case type
    when "version"
      'versionInventory.xml'
    when "additions"
      'versionAdditions.xml'
    when "manifests"
      'manifestInventory.xml'
    when "directory"
      'directoryInventory.xml'
    else
      raise "unknown inventory type: #{type.to_s}"
  end
end

Instance Method Details

#byte_countInteger

Returns The total size (in bytes) in all files of all files in the inventory (dynamically calculated).

Returns:

  • (Integer)

    The total size (in bytes) in all files of all files in the inventory (dynamically calculated)



84
# File 'lib/moab/file_inventory.rb', line 84

attribute :byte_count, Integer, :tag => 'byteCount', :on_save => Proc.new {|t| t.to_s}

#composite_keyString

Returns The unique identifier concatenating digital object id with version id.

Returns:

  • (String)

    The unique identifier concatenating digital object id with version id



58
59
60
# File 'lib/moab/file_inventory.rb', line 58

def composite_key
  @digital_object_id + '-' + StorageObject.version_dirname(@version_id)
end

#copy_ids(other) ⇒ void

This method returns an undefined value.

Returns Copy objectId and versionId values from another class instance into this instance.

Parameters:

  • other (FileInventory)

    another instance of this class from which to clone identity values



146
147
148
149
150
# File 'lib/moab/file_inventory.rb', line 146

def copy_ids(other)
  @digital_object_id = other.digital_object_id
  @version_id = other.version_id
  @inventory_datetime = other.inventory_datetime
end

#data_sourceString

Returns either the version ID (if inventory is a version manifest) or the name of the directory that was harvested to create the inventory

Returns:

  • (String)

    Returns either the version ID (if inventory is a version manifest) or the name of the directory that was harvested to create the inventory



161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
# File 'lib/moab/file_inventory.rb', line 161

def data_source
  data_source = (groups.collect { |g| g.data_source.to_s }).join('|')
  if data_source.start_with?('contentMetadata')
    if version_id
      "v#{version_id.to_s}-#{data_source}"
    else
      "new-#{data_source}"
    end
  else
    if version_id
      "v#{version_id.to_s}"
    else
      data_source
    end

  end
end

#file_signature(group_id, file_id) ⇒ FileSignature

Returns The signature of the specified file.

Parameters:

  • group_id (String)

    The identifer of the group to be selected

  • file_id (String)

    The group-relative path of the file (relative to the appropriate home directory)

Returns:

Raises:



135
136
137
138
139
140
141
# File 'lib/moab/file_inventory.rb', line 135

def file_signature(group_id, file_id)
  file_group = group(group_id)
  raise FileNotFoundException, "group #{group_id} not found for #{@digital_object_id} - #{@version_id}" if file_group.nil?
  file_signature = file_group.path_hash[file_id]
  raise FileNotFoundException, "#{group_id} file #{file_id} not found for #{@digital_object_id} - #{@version_id}" if file_signature.nil?
  file_signature
end

#group(group_id) ⇒ FileGroup

Returns The file group in this inventory for the specified group_id.

Parameters:

  • group_id (String)

    The identifer of the group to be selected

Returns:

  • (FileGroup)

    The file group in this inventory for the specified group_id



116
117
118
# File 'lib/moab/file_inventory.rb', line 116

def group(group_id)
  @groups.find{ |group| group.group_id == group_id}
end

#group_empty?(group_id) ⇒ Boolean

Returns true if the group is missing or empty.

Parameters:

  • group_id (String)

    File group identifer (e.g. data, metadata, manifests)

Returns:

  • (Boolean)

    true if the group is missing or empty



122
123
124
125
# File 'lib/moab/file_inventory.rb', line 122

def group_empty?(group_id)
  group = self.group(group_id)
  group.nil? or group.files.empty?
end

#group_ids(non_empty = nil) ⇒ Array<String>

Returns group identifiers contained in this file inventory.

Parameters:

  • non_empty (Boolean) (defaults to: nil)

    if true, return group_id’s only for groups having files

Returns:

  • (Array<String>)

    group identifiers contained in this file inventory



109
110
111
112
# File 'lib/moab/file_inventory.rb', line 109

def group_ids(non_empty=nil)
  groups = non_empty ? self.non_empty_groups : @groups
  groups.map{|group| group.group_id}
end

#human_sizeString

Returns The total size of the inventory expressed in KB, MB, GB or TB, depending on the magnitutde of the value.

Returns:

  • (String)

    The total size of the inventory expressed in KB, MB, GB or TB, depending on the magnitutde of the value



236
237
238
239
240
241
242
243
244
245
246
247
248
# File 'lib/moab/file_inventory.rb', line 236

def human_size
  count = 0
  size = byte_count
  while size >= 1024 and count < 4
    size /= 1024.0
    count += 1
  end
  if count == 0
    sprintf("%d B", size)
  else
    sprintf("%.2f %s", size, %w[B KB MB GB TB][count])
  end
end

#inventory_from_bagit_bag(bag_dir) ⇒ FileInventory

Returns Traverse a BagIt bag’s payload and return an inventory of the files it contains (using fixity from bag manifest files).

Parameters:

  • bag_dir (Pathname, String)

    The location of the BagIt bag to be inventoried

Returns:

  • (FileInventory)

    Traverse a BagIt bag’s payload and return an inventory of the files it contains (using fixity from bag manifest files)



198
199
200
201
202
203
204
205
206
# File 'lib/moab/file_inventory.rb', line 198

def inventory_from_bagit_bag(bag_dir)
  bag_pathname = Pathname(bag_dir)
  signatures_from_bag = signatures_from_bagit_manifests(bag_pathname)
  bag_data_subdirs = bag_pathname.join('data').children
  bag_data_subdirs.each do |subdir|
    @groups << FileGroup.new(:group_id=>subdir.basename.to_s).group_from_bagit_subdir(subdir, signatures_from_bag)
  end
  self
end

#inventory_from_directory(data_dir, group_id = nil) ⇒ FileInventory

Returns Traverse a directory and return an inventory of the files it contains.

Examples:

Parameters:

  • data_dir (Pathname, String)

    The location of files to be inventoried

  • group_id (String) (defaults to: nil)

    if specified, is used to set the group ID of the FileGroup created from the directory if nil, then the directory is assumed to contain both content and metadata subdirectories

Returns:

  • (FileInventory)

    Traverse a directory and return an inventory of the files it contains



185
186
187
188
189
190
191
192
193
194
# File 'lib/moab/file_inventory.rb', line 185

def inventory_from_directory(data_dir,group_id=nil)
  if group_id
    @groups << FileGroup.new(:group_id=>group_id).group_from_directory(data_dir)
  else
    ['content','metadata'].each do |group_id|
      @groups << FileGroup.new(:group_id=>group_id).group_from_directory(Pathname(data_dir).join(group_id))
    end
  end
  self
end

#non_empty_groupsArray<FileGroup] The set of data groups that contain files

Returns Array<FileGroup] The set of data groups that contain files.

Returns:

  • (Array<FileGroup] The set of data groups that contain files)

    Array<FileGroup] The set of data groups that contain files



103
104
105
# File 'lib/moab/file_inventory.rb', line 103

def non_empty_groups
  @groups.select{|group| !group.files.empty?}
end

#package_idString

Returns Concatenation of the objectId and versionId values.

Returns:

  • (String)

    Concatenation of the objectId and versionId values



154
155
156
# File 'lib/moab/file_inventory.rb', line 154

def package_id
  "#{@digital_object_id}-v#{@version_id}"
end

#signatures_from_bagit_manifests(bag_pathname) ⇒ Hash<Pathname,FileSignature>

Returns The fixity data present in the bag’s manifest files.

Parameters:

  • bag_pathname (Pathname)

    The location of the BagIt bag to be inventoried

Returns:

  • (Hash<Pathname,FileSignature>)

    The fixity data present in the bag’s manifest files



210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
# File 'lib/moab/file_inventory.rb', line 210

def signatures_from_bagit_manifests(bag_pathname)
  manifest_pathname = Hash.new
  checksum_types =  [:md5, :sha1, :sha256]
  checksum_types.each do |type|
    manifest_pathname[type] = bag_pathname.join("manifest-#{type.to_s}.txt")
  end
  signatures = Hash.new { |hash,path| hash[path] = FileSignature.new }
  checksum_types.each do |type|
    if manifest_pathname[type].exist?
      manifest_pathname[type].each_line do |line|
        line.chomp!
        checksum,data_path = line.split(/\s+\**/,2)
        if checksum && data_path
          file_pathname = bag_pathname.join(data_path)
          signature = signatures[file_pathname]
          signature.set_checksum(type, checksum)
        end
      end
    end
  end
  signatures.each {|file_pathname,signature| signature.size = file_pathname.size}
  signatures
end

#summary_fieldsArray<String>

Returns The data fields to include in summary reports.

Returns:

  • (Array<String>)

    The data fields to include in summary reports



128
129
130
# File 'lib/moab/file_inventory.rb', line 128

def summary_fields
  %w{type digital_object_id version_id inventory_datetime file_count byte_count block_count groups}
end

#write_xml_file(parent_dir, type = nil) ⇒ void

This method returns an undefined value.

Returns write the Moab::FileInventory instance to a file.

Examples:

Parameters:

  • parent_dir (Pathname, String)

    The parent directory in which the xml file is to be stored

  • type (String) (defaults to: nil)

    The inventory type, which governs the filename used for serialization



273
274
275
276
# File 'lib/moab/file_inventory.rb', line 273

def write_xml_file(parent_dir, type=nil)
  type = @type if type.nil?
  self.class.write_xml_file(self, parent_dir, type)
end