Class: Moab::Bagger

Inherits:
Object
  • Object
show all
Defined in:
lib/moab/bagger.rb

Overview

Note:

Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University. All rights reserved. See LICENSE for details.

A class used to create a BagIt package from a version inventory and a set of source files. The #fill_bag method is called with a package_mode parameter that specifies whether the bag is being created for deposit into the repository or is to contain the output of a version reconstruction.

  • In :depositor mode, the version inventory is filtered using the digital object’s signature catalog so that only

new files are included
  • In :reconstructor mode, the version inventory and signature catalog are used together to regenerate the complete

set of files for the version.

Data Model

  • StorageRepository = represents a digital object repository storage node

    • StorageServices = supports application layer access to the repository’s objects, data, and metadata

    • StorageObject = represents a digital object’s repository storage location and ingest/dissemination methods

      • StorageObjectVersion [1..*] = represents a version subdirectory within an object’s home directory

        • Bagger [1] = utility for creating bagit packages for ingest or dissemination

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(version_inventory, signature_catalog, bag_pathname) ⇒ Bagger

Returns a new instance of Bagger.

Parameters:

  • version_inventory (FileInventory)

    The complete inventory of the files comprising a digital object version

  • signature_catalog (SignatureCatalog)

    The signature catalog, used to specify source paths (in :reconstructor mode), or to filter the version inventory (in :depositor mode)

  • bag_pathname (Pathname, String)

    The location of the Bagit bag to be created



26
27
28
29
30
31
# File 'lib/moab/bagger.rb', line 26

def initialize(version_inventory, signature_catalog, bag_pathname)
  @version_inventory = version_inventory
  @signature_catalog = signature_catalog
  @bag_pathname = Pathname.new(bag_pathname)
  create_bagit_txt
end

Instance Attribute Details

#bag_inventoryFileInventory

Returns The actual inventory of the files to be packaged (derived from @version_inventory in #fill_bag).

Returns:

  • (FileInventory)

    The actual inventory of the files to be packaged (derived from @version_inventory in #fill_bag)



44
45
46
# File 'lib/moab/bagger.rb', line 44

def bag_inventory
  @bag_inventory
end

#bag_pathnamePathname

Returns The location of the Bagit bag to be created.

Returns:

  • (Pathname)

    The location of the Bagit bag to be created



41
42
43
# File 'lib/moab/bagger.rb', line 41

def bag_pathname
  @bag_pathname
end

#package_modeSymbol

Returns The operational mode controlling what gets bagged #fill_bag and the full path of source files #fill_payload.

Returns:

  • (Symbol)

    The operational mode controlling what gets bagged #fill_bag and the full path of source files #fill_payload



48
49
50
# File 'lib/moab/bagger.rb', line 48

def package_mode
  @package_mode
end

#signature_catalogSignatureCatalog

Returns The signature catalog, used to specify source paths (in :reconstructor mode), or to filter the version inventory (in :depositor mode).

Returns:

  • (SignatureCatalog)

    The signature catalog, used to specify source paths (in :reconstructor mode), or to filter the version inventory (in :depositor mode)



38
39
40
# File 'lib/moab/bagger.rb', line 38

def signature_catalog
  @signature_catalog
end

#version_inventoryFileInventory

Returns The complete inventory of the files comprising a digital object version.

Returns:

  • (FileInventory)

    The complete inventory of the files comprising a digital object version



34
35
36
# File 'lib/moab/bagger.rb', line 34

def version_inventory
  @version_inventory
end

Instance Method Details

#create_bag_info_txtvoid

This method returns an undefined value.

Returns Generate the bag-info.txt tag file.



212
213
214
215
216
217
218
# File 'lib/moab/bagger.rb', line 212

def create_bag_info_txt
  bag_pathname.join('bag-info.txt').open('w') do |f|
    f.puts "External-Identifier: #{bag_inventory.package_id}"
    f.puts "Payload-Oxum: #{bag_inventory.byte_count}.#{bag_inventory.file_count}"
    f.puts "Bag-Size: #{bag_inventory.human_size}"
  end
end

#create_bag_inventory(package_mode) ⇒ FileInventory

Returns Create, write, and return the inventory of the files that will become the payload.

Parameters:

  • package_mode (Symbol)

    The operational mode controlling what gets bagged and the full path of source files (Bagger#fill_payload)

Returns:

  • (FileInventory)

    Create, write, and return the inventory of the files that will become the payload



100
101
102
103
104
105
106
107
108
109
110
111
112
113
# File 'lib/moab/bagger.rb', line 100

def create_bag_inventory(package_mode)
  @package_mode = package_mode
  bag_pathname.mkpath
  case package_mode
  when :depositor
    version_inventory.write_xml_file(bag_pathname, 'version')
    @bag_inventory = signature_catalog.version_additions(version_inventory)
    bag_inventory.write_xml_file(bag_pathname, 'additions')
  when :reconstructor
    @bag_inventory = version_inventory
    bag_inventory.write_xml_file(bag_pathname, 'version')
  end
  bag_inventory
end

#create_bagit_txtvoid

This method returns an undefined value.

Returns Generate the bagit.txt tag file.



59
60
61
62
63
64
65
# File 'lib/moab/bagger.rb', line 59

def create_bagit_txt
  bag_pathname.mkpath
  bag_pathname.join('bagit.txt').open('w') do |f|
    f.puts 'Tag-File-Character-Encoding: UTF-8'
    f.puts 'BagIt-Version: 0.97'
  end
end

#create_payload_manifestsvoid

This method returns an undefined value.

Returns Using the checksum information from the inventory, create BagIt manifest files for the payload.



182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
# File 'lib/moab/bagger.rb', line 182

def create_payload_manifests
  manifest_pathname = {}
  manifest_file = {}
  DEFAULT_CHECKSUM_TYPES.each do |type|
    manifest_pathname[type] = bag_pathname.join("manifest-#{type}.txt")
    manifest_file[type] = manifest_pathname[type].open('w')
  end
  bag_inventory.groups.each do |group|
    group.files.each do |file|
      fixity = file.signature.fixity
      file.instances.each do |instance|
        data_path = File.join('data', group.group_id, instance.path)
        DEFAULT_CHECKSUM_TYPES.each do |type|
          manifest_file[type].puts("#{fixity[type]} #{data_path}") if fixity[type]
        end
      end
    end
  end
ensure
  DEFAULT_CHECKSUM_TYPES.each do |type|
    if manifest_file[type]
      manifest_file[type].close
      manifest_pathname[type].delete if
          manifest_pathname[type].exist? && manifest_pathname[type].empty?
    end
  end
end

#create_tagfile_manifestsvoid

This method returns an undefined value.

Returns create BagIt tag manifest files containing checksums for all files in the bag’s root directory.



222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
# File 'lib/moab/bagger.rb', line 222

def create_tagfile_manifests
  manifest_pathname = {}
  manifest_file = {}
  DEFAULT_CHECKSUM_TYPES.each do |type|
    manifest_pathname[type] = bag_pathname.join("tagmanifest-#{type}.txt")
    manifest_file[type] = manifest_pathname[type].open('w')
  end
  bag_pathname.children.each do |file|
    next unless include_in_tagfile_manifests?(file)

    signature = FileSignature.new.signature_from_file(file)
    fixity = signature.fixity
    DEFAULT_CHECKSUM_TYPES.each do |type|
      manifest_file[type].puts("#{fixity[type]} #{file.basename}") if fixity[type]
    end
  end
ensure
  DEFAULT_CHECKSUM_TYPES.each do |type|
    if manifest_file[type]
      manifest_file[type].close
      manifest_pathname[type].delete if
          manifest_pathname[type].exist? && manifest_pathname[type].empty?
    end
  end
end

#create_tagfilesBoolean

Returns create BagIt manifests and tag files. Return true if successful.

Returns:

  • (Boolean)

    create BagIt manifests and tag files. Return true if successful



172
173
174
175
176
177
178
# File 'lib/moab/bagger.rb', line 172

def create_tagfiles
  create_payload_manifests
  create_bag_info_txt
  create_bagit_txt
  create_tagfile_manifests
  true
end

#create_tarfile(tar_pathname = nil) ⇒ Boolean

Returns Create a tar file containing the bag.

Returns:

  • (Boolean)

    Create a tar file containing the bag

Raises:



256
257
258
259
260
261
262
263
264
265
266
267
268
269
# File 'lib/moab/bagger.rb', line 256

def create_tarfile(tar_pathname = nil)
  bag_name = bag_pathname.basename
  bag_parent = bag_pathname.parent
  tar_pathname ||= bag_parent.join("#{bag_name}.tar")
  tar_cmd = "cd '#{bag_parent}'; tar --dereference --force-local -cf  '#{tar_pathname}' '#{bag_name}'"
  begin
    shell_execute(tar_cmd)
  rescue
    shell_execute(tar_cmd.sub('--force-local', ''))
  end
  raise(MoabRuntimeError, "Unable to create tarfile #{tar_pathname}") unless tar_pathname.exist?

  true
end

#delete_bagNilClass

Returns Delete the bagit files.

Returns:

  • (NilClass)

    Delete the bagit files



68
69
70
71
72
# File 'lib/moab/bagger.rb', line 68

def delete_bag
  # make sure this looks like a bag before deleting
  bag_pathname.rmtree if bag_pathname.join('bagit.txt').exist?
  nil
end

#delete_tarfileObject

Parameters:

  • tar_pathname (Pathname)

    The location of the tar file (default is based on bag location)



75
76
77
78
79
80
# File 'lib/moab/bagger.rb', line 75

def delete_tarfile
  bag_name = bag_pathname.basename
  bag_parent = bag_pathname.parent
  tar_pathname = bag_parent.join("#{bag_name}.tar")
  tar_pathname.delete if tar_pathname.exist?
end

#deposit_group(group_id, source_dir) ⇒ Boolean

Copy all the files listed in the group inventory to the bag. Return true if successful or nil if the group was not found in the inventory

Parameters:

  • group_id (String)

    The name of the data group being copied to the bag

  • source_dir (Pathname)

    The location from which files should be copied

Returns:

  • (Boolean)

    Copy all the files listed in the group inventory to the bag. Return true if successful or nil if the group was not found in the inventory



136
137
138
139
140
141
142
143
144
145
146
147
148
# File 'lib/moab/bagger.rb', line 136

def deposit_group(group_id, source_dir)
  group = bag_inventory.group(group_id)
  return nil? if group.nil? || group.files.empty?

  target_dir = bag_pathname.join('data', group_id)
  group.path_list.each do |relative_path|
    source = source_dir.join(relative_path)
    target = target_dir.join(relative_path)
    target.parent.mkpath
    FileUtils.symlink source, target
  end
  true
end

#fill_bag(package_mode, source_base_pathname) ⇒ Bagger

Returns Perform all the operations required to fill the bag payload, write the manifests and tagfiles, and checksum the tagfiles.

Examples:

Parameters:

  • package_mode (Symbol)

    The operational mode controlling what gets bagged and the full path of source files (Bagger#fill_payload)

  • source_base_pathname (Pathname)

    The home location of the source files

Returns:

  • (Bagger)

    Perform all the operations required to fill the bag payload, write the manifests and tagfiles, and checksum the tagfiles



89
90
91
92
93
94
# File 'lib/moab/bagger.rb', line 89

def fill_bag(package_mode, source_base_pathname)
  create_bag_inventory(package_mode)
  fill_payload(source_base_pathname)
  create_tagfiles
  self
end

#fill_payload(source_base_pathname) ⇒ void

This method returns an undefined value.

This method uses Unix hard links in order to greatly speed up the process. Hard links, however, require that the target bag must be created within the same filesystem as the source files

Parameters:

  • source_base_pathname (Pathname)

    The home location of the source files



120
121
122
123
124
125
126
127
128
129
130
# File 'lib/moab/bagger.rb', line 120

def fill_payload(source_base_pathname)
  bag_inventory.groups.each do |group|
    group_id = group.group_id
    case package_mode
    when :depositor
      deposit_group(group_id, source_base_pathname.join(group_id))
    when :reconstructor
      reconstuct_group(group_id, source_base_pathname)
    end
  end
end

#include_in_tagfile_manifests?(file) ⇒ Boolean

Returns:

  • (Boolean)


248
249
250
251
252
253
# File 'lib/moab/bagger.rb', line 248

def include_in_tagfile_manifests?(file)
  basename = file.basename.to_s
  return false if file.directory? || basename.start_with?('tagmanifest') || basename.match?(/\A\.nfs\w+\z/)

  true
end

#reconstuct_group(group_id, storage_object_dir) ⇒ Boolean

Copy all the files listed in the group inventory to the bag. Return true if successful or nil if the group was not found in the inventory

Parameters:

  • group_id (String)

    The name of the data group being copied to the bag

  • storage_object_dir (Pathname)

    The home location of the object store from which files should be copied

Returns:

  • (Boolean)

    Copy all the files listed in the group inventory to the bag. Return true if successful or nil if the group was not found in the inventory



154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
# File 'lib/moab/bagger.rb', line 154

def reconstuct_group(group_id, storage_object_dir)
  group = bag_inventory.group(group_id)
  return nil? if group.nil? || group.files.empty?

  target_dir = bag_pathname.join('data', group_id)
  group.files.each do |file|
    catalog_entry = signature_catalog.signature_hash[file.signature]
    source = storage_object_dir.join(catalog_entry.storage_path)
    file.instances.each do |instance|
      target = target_dir.join(instance.path)
      target.parent.mkpath
      FileUtils.symlink source, target unless target.exist?
    end
  end
  true
end

#reset_bagvoid

This method returns an undefined value.

Returns Delete any existing bag data and re-initialize the bag directory.



51
52
53
54
55
# File 'lib/moab/bagger.rb', line 51

def reset_bag
  delete_bag
  delete_tarfile
  create_bagit_txt
end

#shell_execute(command) ⇒ Object

Executes a system command in a subprocess if command isn’t successful, grabs stdout and stderr and puts them in ruby exception message

Returns:

  • stdout if execution was successful



274
275
276
277
278
279
280
281
282
283
284
285
286
287
# File 'lib/moab/bagger.rb', line 274

def shell_execute(command)
  require 'open3'
  stdout, stderr, status = Open3.capture3(command.chomp)
  if status.success? && status.exitstatus.zero?
    stdout
  else
    msg = "Shell command failed: [#{command}] caused by <STDERR = #{stderr}>"
    msg << " STDOUT = #{stdout}" if stdout&.length&.positive?
    raise(MoabStandardError, msg)
  end
rescue SystemCallError => e
  msg = "Shell command failed: [#{command}] caused by #{e.inspect}"
  raise(MoabStandardError, msg)
end