Class: Moab::FileInventory

Inherits:
Manifest
  • Object
show all
Includes:
HappyMapper
Defined in:
lib/moab/file_inventory.rb

Overview

Note:

Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University. All rights reserved. See LICENSE for details.

A structured container for recording information about a collection of related files.

The scope of the file collection depends on inventory type:

  • version = full set of data files comprising a digital object's version

  • additions = subset of data files that were newly added in the specified version

  • manifests = the fixity data for manifest files in the version's root folder

  • directory = set of files that were harvested from a filesystem directory

The inventory contains one or more FileGroup subsets, which are most commonly used to provide segregation of digital object version's content and metadata files. Each group contains one or more FileManifestation entities, each of which represents a point-in-time snapshot of a given file's filesystem characteristics. The fixity data for a file is stored in a FileSignature element, while the filename and modification data are stored in one or more FileInstance elements. (Copies of a given file may be present in multiple locations in a collection)

Data Model

  • FileInventory = container for recording information about a collection of related files

    • FileGroup [1..*] = subset allow segregation of content and metadata files

Examples:

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(opts = {}) ⇒ FileInventory

Returns a new instance of FileInventory


39
40
41
42
43
# File 'lib/moab/file_inventory.rb', line 39

def initialize(opts={})
  @groups = Array.new
  @inventory_datetime = Time.now
  super(opts)
end

Instance Attribute Details

#block_countInteger

Returns The total disk usage (in 1 kB blocks) of all data files (estimating du -k result) (dynamically calculated)

Returns:

  • (Integer)

    The total disk usage (in 1 kB blocks) of all data files (estimating du -k result) (dynamically calculated)


92
# File 'lib/moab/file_inventory.rb', line 92

attribute :block_count, Integer, :tag => 'blockCount', :on_save => Proc.new {|t| t.to_s}

#digital_object_idString

Returns The digital object identifier (druid)

Returns:

  • (String)

    The digital object identifier (druid)


51
# File 'lib/moab/file_inventory.rb', line 51

attribute :digital_object_id, String, :tag => 'objectId'

#file_countInteger

Returns The total number of data files in the inventory (dynamically calculated)

Returns:

  • (Integer)

    The total number of data files in the inventory (dynamically calculated)


76
# File 'lib/moab/file_inventory.rb', line 76

attribute :file_count, Integer, :tag => 'fileCount', :on_save => Proc.new {|t| t.to_s}

#groupsArray<FileGroup>

Returns The set of data groups comprising the version

Returns:

  • (Array<FileGroup>)

    The set of data groups comprising the version


100
# File 'lib/moab/file_inventory.rb', line 100

has_many :groups, FileGroup, :tag => 'fileGroup'

#inventory_datetimeString

Returns The datetime at which the inventory was created

Returns:

  • (String)

    The datetime at which the inventory was created


64
# File 'lib/moab/file_inventory.rb', line 64

attribute :inventory_datetime, String, :tag => 'inventoryDatetime'

#typeString

Returns The type of inventory (version|additions|manifests|directory)

Returns:

  • (String)

    The type of inventory (version|additions|manifests|directory)


47
# File 'lib/moab/file_inventory.rb', line 47

attribute :type, String

#version_idInteger

Returns The ordinal version number

Returns:

  • (Integer)

    The ordinal version number


55
# File 'lib/moab/file_inventory.rb', line 55

attribute :version_id, Integer, :tag => 'versionId', :key => true, :on_save => Proc.new {|n| n.to_s}

Class Method Details

.xml_filename(type = nil) ⇒ String

Returns The standard name for the serialized inventory file of the given type

Parameters:

  • type (String) (defaults to: nil)

    Specifies the type of inventory, and thus the filename used for storage

Returns:

  • (String)

    The standard name for the serialized inventory file of the given type


252
253
254
255
256
257
258
259
260
261
262
263
264
265
# File 'lib/moab/file_inventory.rb', line 252

def self.xml_filename(type=nil)
  case type
    when "version"
      'versionInventory.xml'
    when "additions"
      'versionAdditions.xml'
    when "manifests"
      'manifestInventory.xml'
    when "directory"
      'directoryInventory.xml'
    else
      raise "unknown inventory type: #{type.to_s}"
  end
end

Instance Method Details

#byte_countInteger

Returns The total size (in bytes) in all files of all files in the inventory (dynamically calculated)

Returns:

  • (Integer)

    The total size (in bytes) in all files of all files in the inventory (dynamically calculated)


84
# File 'lib/moab/file_inventory.rb', line 84

attribute :byte_count, Integer, :tag => 'byteCount', :on_save => Proc.new {|t| t.to_s}

#composite_keyString

Returns The unique identifier concatenating digital object id with version id

Returns:

  • (String)

    The unique identifier concatenating digital object id with version id


58
59
60
# File 'lib/moab/file_inventory.rb', line 58

def composite_key
  @digital_object_id + '-' + StorageObject.version_dirname(@version_id)
end

#copy_ids(other)

This method returns an undefined value.

Returns Copy objectId and versionId values from another class instance into this instance

Parameters:

  • other (FileInventory)

    another instance of this class from which to clone identity values


146
147
148
149
150
# File 'lib/moab/file_inventory.rb', line 146

def copy_ids(other)
  @digital_object_id = other.digital_object_id
  @version_id = other.version_id
  @inventory_datetime = other.inventory_datetime
end

#data_sourceString

Returns either the version ID (if inventory is a version manifest) or the name of the directory that was harvested to create the inventory

Returns:

  • (String)

    Returns either the version ID (if inventory is a version manifest) or the name of the directory that was harvested to create the inventory


160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
# File 'lib/moab/file_inventory.rb', line 160

def data_source
  data_source = (groups.collect { |g| g.data_source.to_s }).join('|')
  if data_source.start_with?('contentMetadata')
    if version_id
      "v#{version_id.to_s}-#{data_source}"
    else
      "new-#{data_source}"
    end
  else
    if version_id
      "v#{version_id.to_s}"
    else
      data_source
    end

  end
end

#file_signature(group_id, file_id) ⇒ FileSignature

Returns The signature of the specified file

Parameters:

  • group_id (String)

    The identifer of the group to be selected

  • file_id (String)

    The group-relative path of the file (relative to the appropriate home directory)

Returns:

Raises:


135
136
137
138
139
140
141
# File 'lib/moab/file_inventory.rb', line 135

def file_signature(group_id, file_id)
  file_group = group(group_id)
  raise FileNotFoundException, "group #{group_id} not found for #{@digital_object_id} - #{@version_id}" if file_group.nil?
  file_signature = file_group.path_hash[file_id]
  raise FileNotFoundException, "#{group_id} file #{file_id} not found for #{@digital_object_id} - #{@version_id}" if file_signature.nil?
  file_signature
end

#group(group_id) ⇒ FileGroup

Returns The file group in this inventory for the specified group_id

Parameters:

  • group_id (String)

    The identifer of the group to be selected

Returns:

  • (FileGroup)

    The file group in this inventory for the specified group_id


116
117
118
# File 'lib/moab/file_inventory.rb', line 116

def group(group_id)
  @groups.find{ |group| group.group_id == group_id}
end

#group_empty?(group_id) ⇒ Boolean

Returns true if the group is missing or empty

Parameters:

  • group_id (String)

    File group identifer (e.g. data, metadata, manifests)

Returns:

  • (Boolean)

    true if the group is missing or empty


122
123
124
125
# File 'lib/moab/file_inventory.rb', line 122

def group_empty?(group_id)
  group = self.group(group_id)
  group.nil? or group.files.empty?
end

#group_ids(non_empty = nil) ⇒ Array<String>

Returns group identifiers contained in this file inventory

Parameters:

  • non_empty (Boolean) (defaults to: nil)

    if true, return group_id's only for groups having files

Returns:

  • (Array<String>)

    group identifiers contained in this file inventory


109
110
111
112
# File 'lib/moab/file_inventory.rb', line 109

def group_ids(non_empty=nil)
  groups = non_empty ? self.non_empty_groups : @groups
  groups.map{|group| group.group_id}
end

#human_sizeString

Returns The total size of the inventory expressed in KB, MB, GB or TB, depending on the magnitutde of the value

Returns:

  • (String)

    The total size of the inventory expressed in KB, MB, GB or TB, depending on the magnitutde of the value


235
236
237
238
239
240
241
242
243
244
245
246
247
# File 'lib/moab/file_inventory.rb', line 235

def human_size
  count = 0
  size = byte_count
  while size >= 1024 and count < 4
    size /= 1024.0
    count += 1
  end
  if count == 0
    sprintf("%d B", size)
  else
    sprintf("%.2f %s", size, %w[B KB MB GB TB][count])
  end
end

#inventory_from_bagit_bag(bag_dir) ⇒ FileInventory

Returns Traverse a BagIt bag's payload and return an inventory of the files it contains (using fixity from bag manifest files)

Parameters:

  • bag_dir (Pathname, String)

    The location of the BagIt bag to be inventoried

Returns:

  • (FileInventory)

    Traverse a BagIt bag's payload and return an inventory of the files it contains (using fixity from bag manifest files)


197
198
199
200
201
202
203
204
205
# File 'lib/moab/file_inventory.rb', line 197

def inventory_from_bagit_bag(bag_dir)
  bag_pathname = Pathname(bag_dir)
  signatures_from_bag = signatures_from_bagit_manifests(bag_pathname)
  bag_data_subdirs = bag_pathname.join('data').children
  bag_data_subdirs.each do |subdir|
    @groups << FileGroup.new(:group_id=>subdir.basename.to_s).group_from_bagit_subdir(subdir, signatures_from_bag)
  end
  self
end

#inventory_from_directory(data_dir, group_id = nil) ⇒ FileInventory

Returns Traverse a directory and return an inventory of the files it contains

Examples:

Parameters:

  • data_dir (Pathname, String)

    The location of files to be inventoried

  • group_id (String) (defaults to: nil)

    if specified, is used to set the group ID of the FileGroup created from the directory if nil, then the directory is assumed to contain both content and metadata subdirectories

Returns:

  • (FileInventory)

    Traverse a directory and return an inventory of the files it contains


184
185
186
187
188
189
190
191
192
193
# File 'lib/moab/file_inventory.rb', line 184

def inventory_from_directory(data_dir,group_id=nil)
  if group_id
    @groups << FileGroup.new(:group_id=>group_id).group_from_directory(data_dir)
  else
    ['content','metadata'].each do |group_id|
      @groups << FileGroup.new(:group_id=>group_id).group_from_directory(Pathname(data_dir).join(group_id))
    end
  end
  self
end

#non_empty_groupsArray<FileGroup] The set of data groups that contain files

Returns Array<FileGroup] The set of data groups that contain files

Returns:

  • (Array<FileGroup] The set of data groups that contain files)

    Array<FileGroup] The set of data groups that contain files


103
104
105
# File 'lib/moab/file_inventory.rb', line 103

def non_empty_groups
  @groups.select{|group| !group.files.empty?}
end

#package_idString

Returns Concatenation of the objectId and versionId values

Returns:

  • (String)

    Concatenation of the objectId and versionId values


154
155
156
# File 'lib/moab/file_inventory.rb', line 154

def package_id
  "#{@digital_object_id}-v#{@version_id}"
end

#signatures_from_bagit_manifests(bag_pathname) ⇒ Hash<Pathname,FileSignature>

Returns The fixity data present in the bag's manifest files

Parameters:

  • bag_pathname (Pathname)

    The location of the BagIt bag to be inventoried

Returns:

  • (Hash<Pathname,FileSignature>)

    The fixity data present in the bag's manifest files


209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
# File 'lib/moab/file_inventory.rb', line 209

def signatures_from_bagit_manifests(bag_pathname)
  manifest_pathname = Hash.new
  checksum_types =  [:md5, :sha1, :sha256]
  checksum_types.each do |type|
    manifest_pathname[type] = bag_pathname.join("manifest-#{type.to_s}.txt")
  end
  signatures = Hash.new { |hash,path| hash[path] = FileSignature.new }
  checksum_types.each do |type|
    if manifest_pathname[type].exist?
      manifest_pathname[type].each_line do |line|
        line.chomp!
        checksum,data_path = line.split(/\s+\**/,2)
        if checksum && data_path
          file_pathname = bag_pathname.join(data_path)
          signature = signatures[file_pathname]
          signature.set_checksum(type, checksum)
        end
      end
    end
  end
  signatures.each {|file_pathname,signature| signature.size = file_pathname.size}
  signatures
end

#summary_fieldsArray<String>

Returns The data fields to include in summary reports

Returns:

  • (Array<String>)

    The data fields to include in summary reports


128
129
130
# File 'lib/moab/file_inventory.rb', line 128

def summary_fields
  %w{type digital_object_id version_id inventory_datetime file_count byte_count block_count groups}
end

#write_xml_file(parent_dir, type = nil)

This method returns an undefined value.

Returns write the Moab::FileInventory instance to a file

Examples:

Parameters:

  • parent_dir (Pathname, String)

    The parent directory in which the xml file is to be stored

  • type (String) (defaults to: nil)

    The inventory type, which governs the filename used for serialization


272
273
274
275
# File 'lib/moab/file_inventory.rb', line 272

def write_xml_file(parent_dir, type=nil)
  type = @type if type.nil?
  self.class.write_xml_file(self, parent_dir, type)
end