Class: Assembly::Utils

Inherits:
Object
  • Object
show all
Defined in:
lib/assembly-utils/utils.rb,
lib/assembly-utils/version.rb

Overview

Main Utils class

Constant Summary collapse

REPO =
'dor'
VERSION =

Project version number

'1.5.0'

Class Method Summary collapse

Class Method Details

.apo_workflow_defined?(druid, workflow) ⇒ boolean

Determines if the specifed APO object contains a specified workflow defined in it DEPRACATED NOW THAT REIFED WORKFLOWS ARE USED Example:

Assembly::Utils.apo_workflow_defined?('druid:oo000oo0001','assembly')

> true

Parameters:

  • druid (string)
    • the druid of the APO to check

  • workflow (string)
    • the name of the workflow to check

Returns:

  • (boolean)

    if workflow is defined in APO



359
360
361
362
363
364
365
# File 'lib/assembly-utils/utils.rb', line 359

def self.apo_workflow_defined?(druid, workflow)
  puts '************WARNING - THIS METHOD MAY NOT BE USEFUL ANYMORE SINCE WORKFLOWS ARE NO LONGER DEFINED IN THE APO**************'
  obj = Dor::Item.find(druid)
  raise 'object not an APO' if obj..objectType.first != 'adminPolicy'
  xml_doc = Nokogiri::XML(obj..content)
  xml_doc.xpath("//#{workflow}").size == 1 || xml_doc.xpath("//*[@id='#{workflow}']").size == 1
end

.claim_druid(pid) ⇒ boolean

Claim a specific druid as already used to be sure it won’t get used again. Not needed for normal purposes, only if you manually register something in Fedora Admin outside of DOR services gem.

Example:

puts Assembly::Utils.claim_druid('aa000aa0001')
> true

Parameters:

  • pid (String)

    druid pid (e.g. ‘aa000aa0001’)

Returns:

  • (boolean)

    indicates success of web service call



57
58
59
60
61
62
63
# File 'lib/assembly-utils/utils.rb', line 57

def self.claim_druid(pid)
  sc   = Dor::Config.suri
  url  = "#{sc.url}/suri2/namespaces/#{sc.id_namespace}"
  rcr  = RestClient::Resource.new(url, :user => sc.user, :password => sc.pass)
  resp = rcr["identifiers/#{pid}"].put('')
  resp.code == '204'
end

.cleanup(params = {}) ⇒ Object

Cleanup a list of objects and associated files given a list of druids. WARNING: VERY DESTRUCTIVE. This method only works when this gem is used in a project that is configured to connect to DOR

Example:

Assembly::Utils.cleanup(:druids=>['druid:aa000aa0001','druid:aa000aa0002'],:steps=>[:stacks,:dor,:stage,:symlinks,:workflows])

Parameters:

  • params (Hash) (defaults to: {})

    parameters specified as a hash, using symbols for options:

    • :druids => array of druids to cleanup

    • :steps => an array of steps, specified as symbols, indicating steps to be run, options are:

      :stacks=This will remove all files from the stacks that were shelved for the objects
      :dor=This will delete objects from Fedora
      :stage=This will delete the staged content in the assembly workspace
      :symlinks=This will remove the symlink from the dor workspace
      :workflows=This will remove the assemblyWF and accessoiningWF workflows for this object
      
    • :dry_run => do not actually clean up (defaults to false)



182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
# File 'lib/assembly-utils/utils.rb', line 182

def self.cleanup(params = {})
  druids  = params[:druids]  || []
  steps   = params[:steps]   || []
  dry_run = params[:dry_run] || false

  allowed_steps = {:stacks => 'This will remove all files from the stacks that were shelved for the objects',
                  :dor => 'This will delete objects from Fedora',
                  :stage => "This will delete the staged content in #{Assembly::ASSEMBLY_WORKSPACE}",
                  :symlinks => "This will remove the symlink from #{Assembly::DOR_WORKSPACE}",
                  :workflows => 'This will remove the accessionWF and assemblyWF workflows'}

  num_steps = 0

  puts 'THIS IS A DRY RUN' if dry_run

  Assembly::Utils.confirm "Run on '#{ENV['ROBOT_ENVIRONMENT']}'? Any response other than 'y' or 'yes' will stop the cleanup now."
  Assembly::Utils.confirm 'Are you really sure you want to run on production?  CLEANUP IS NOT REVERSIBLE' if ENV['ROBOT_ENVIRONMENT'] == 'production'

  steps.each do |step|
    if allowed_steps.keys.include?(step)
      Assembly::Utils.confirm "Run step '#{step}'?  #{allowed_steps[step]}.  Any response other than 'y' or 'yes' will stop the cleanup now."
      num_steps += 1 # count the valid steps found and agreed to
    end
  end
  raise 'no valid steps specified for cleanup' if num_steps == 0
  raise 'no druids provided' if druids.size == 0

  druids.each {|pid| Assembly::Utils.cleanup_object(pid, steps, dry_run)}
end

.cleanup_object(pid, steps, dry_run = false) ⇒ Object

Cleanup a single objects and associated files given a druid. WARNING: VERY DESTRUCTIVE. This method only works when this gem is used in a project that is configured to connect to DOR

Example:

Assembly::Utils.cleanup_object('druid:aa000aa0001',[:stacks,:dor,:stage,:symlinks,:workflows])

Parameters:

  • pid (string)

    a druid

  • steps (array)

    an array of steps, options below :stacks=This will remove all files from the stacks that were shelved for the objects :dor=This will delete objects from Fedora :stage=This will delete the staged content in the assembly workspace :symlinks=This will remove the symlink from the dor workspace :workflows=This will remove the assemblyWF and accessoiningWF workflows for this object

  • dry_run (boolean) (defaults to: false)

    do not actually clean up (defaults to false)



226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
# File 'lib/assembly-utils/utils.rb', line 226

def self.cleanup_object(pid, steps, dry_run = false)
  # start up an SSH session if we are going to try and remove content from the stacks
  ssh_session = Net::SSH.start(Dor::Config.stacks.host, Dor::Config.stacks.user, :auth_methods => %w(gssapi-with-mic publickey hostbased password keyboard-interactive)) if steps.include?(:stacks) && defined?(stacks_server)

  druid_tree = DruidTools::Druid.new(pid).tree
  puts "Cleaning up #{pid}"
  if steps.include?(:dor)
    puts "-- deleting #{pid} from Fedora #{ENV['ROBOT_ENVIRONMENT']}"
    Assembly::Utils.unregister(pid) unless dry_run
  end
  if steps.include?(:symlinks)
    path_to_symlinks = []
    path_to_symlinks << File.join(Assembly::DOR_WORKSPACE, druid_tree)
    path_to_symlinks << Assembly::Utils.get_staging_path(pid, Assembly::DOR_WORKSPACE)
    path_to_symlinks.each do |path|
      if File.directory?(path)
        puts "-- deleting folder #{path} (WARNING: should have been a symlink)"
        FileUtils.rm_rf path unless dry_run
      elsif File.symlink?(path)
        puts "-- deleting symlink #{path}"
        File.delete(path) unless dry_run
      else
        puts "-- Skipping #{path}: not a folder or symlink"
      end
    end
  end
  if steps.include?(:stage)
    path_to_content = Assembly::Utils.get_staging_path(pid, Assembly::ASSEMBLY_WORKSPACE)
    puts "-- deleting folder #{path_to_content}"
    FileUtils.rm_rf path_to_content if !dry_run && File.exist?(path_to_content)
  end
  if steps.include?(:stacks)
    path_to_content = Dor::DigitalStacksService.stacks_storage_dir(pid)
    puts "-- removing files from the stacks on #{stacks_server} at #{path_to_content}"
    ssh_session.exec!("rm -fr #{path_to_content}") unless dry_run
  end
  if steps.include?(:workflows)
    puts "-- deleting #{pid} accessionWF and assemblyWF workflows from Fedora #{ENV['ROBOT_ENVIRONMENT']}"
    unless dry_run
      Dor::Config.workflow.client.delete_workflow('dor', pid, 'accessionWF')
      Dor::Config.workflow.client.delete_workflow('dor', pid, 'assemblyWF')
    end
  end
rescue Exception => e
  puts "** cleaning up failed for #{pid} with #{e.message}"
ensure
  ssh_session.close if ssh_session
end

.clear_stray_workflowsObject

Clear stray workflows - remove any workflow steps for orphaned objects. This method only works when this gem is used in a project that is configured to connect to DOR



457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
# File 'lib/assembly-utils/utils.rb', line 457

def self.clear_stray_workflows
  repo      = 'dor'
  wf        = 'assemblyWF'
  msg       = 'Integration testing'
  wfs       = Dor::Config.workflow.client
  steps     = Assembly::ASSEMBLY_WF_STEPS.map { |s| s[0] }
  completed = steps[0]

  steps.each do |waiting|
    druids = wfs.get_objects_for_workstep completed, waiting, repo, wf
    druids.each do |dru|
      params = [repo, dru, wf, waiting, msg]
      resp = wfs.update_workflow_error_status *params
      puts "updated: resp=#{resp} params=#{params.inspect}"
    end
  end
end

.confirm(message) ⇒ Object

Used by the cleanup to ask user for confirmation of each step. Any response other than ‘yes’ raises an error

Parameters:

  • message (string)

    the message to show to a user



783
784
785
786
787
# File 'lib/assembly-utils/utils.rb', line 783

def self.confirm(message)
  puts message
  response = gets.chomp.downcase
  raise 'Exiting' if response != 'y' && response != 'yes'
end

.delete_all_workflows(pid, repo = 'dor') ⇒ Object

Delete all workflows for the given PID. Destructive and should only be used when deleting an object from DOR. This method only works when this gem is used in a project that is configured to connect to DOR

e.g. Assembly::Utils.delete_all_workflows(‘druid:oo000oo0001’)

Parameters:

  • pid (string)

    of druid

  • repo (String) (defaults to: 'dor')

    repository dealing with the workflow. Default is ‘dor’. Another option is ‘sdr’



440
441
442
# File 'lib/assembly-utils/utils.rb', line 440

def self.delete_all_workflows(pid, repo = 'dor')
  Dor::Config.workflow.client.get_workflows(pid).each {|workflow| Dor::Config.workflow.client.delete_workflow(repo, pid, workflow)}
end

.delete_from_dor(pid) ⇒ Object

Delete an object from DOR. This method only works when this gem is used in a project that is configured to connect to DOR

Example:

Assembly::Utils.delete_from_dor('druid:aa000aa0001')

Parameters:

  • pid (string)

    the druid



282
283
284
285
286
# File 'lib/assembly-utils/utils.rb', line 282

def self.delete_from_dor(pid)
  Dor::Config.fedora.client["objects/#{pid}"].delete
  Dor::SearchService.solr.delete_by_id(pid)
  Dor::SearchService.solr.commit
end

.export_objects(pids, output_dir) ⇒ Object

Export one or more objects given a single or array of pids, with output to the specified directory as FOXML files

Example:

Assembly::Utils.export_objects(['druid:aa000aa0001','druid:bb000bb0001'],'/tmp')

Parameters:

  • pids (Array)
    • an array of pids to export (can also pass a single pid as a string)

  • output_dir (String)
    • the full path to output the foxml files



72
73
74
75
# File 'lib/assembly-utils/utils.rb', line 72

def self.export_objects(pids, output_dir)
  pids = [pids] if pids.class == String
  pids.each {|pid| ActiveFedora::FixtureExporter.export_to_path(pid, output_dir)}
end

.get_druids_by_sourceid(source_ids) ⇒ array

Get a list of druids that match the given array of source IDs. This method only works when this gem is used in a project that is configured to connect to DOR

Example:

puts Assembly::Utils.get_druids_by_sourceid(['revs-01','revs-02'])
> ['druid:aa000aa0001','druid:aa000aa0002']

Parameters:

  • source_ids (String)

    array of source ids to lookup

Returns:

  • (array)

    druids



100
101
102
103
104
# File 'lib/assembly-utils/utils.rb', line 100

def self.get_druids_by_sourceid(source_ids)
  druids = []
  source_ids.each {|sid| druids << Dor::SearchService.query_by_id(sid)}
  druids.flatten
end

.get_druids_from_log(progress_log_file, completed = true) ⇒ array

Read in a list of druids from a pre-assembly progress load file and load into an array.

Example:

druids = Assembly::Utils.get_druids_from_log '/dor/preassembly/sohp_accession_log.yaml'
puts druids
> ['aa000aa0001', 'aa000aa0002']

Parameters:

  • progress_log_file (string)

    filename

  • completed (boolean) (defaults to: true)

    if true, returns druids that have completed, if false, returns druids that failed (defaults to true)

Returns:

  • (array)

    list of druids



651
652
653
654
655
656
657
# File 'lib/assembly-utils/utils.rb', line 651

def self.get_druids_from_log(progress_log_file, completed = true)
  druids = []
   docs = YAML.load_stream(Assembly::Utils.read_file(progress_log_file))
   docs = docs.documents if docs.respond_to? :documents
   docs.each { |obj| druids << obj[:pid] if obj[:pre_assem_finished] == completed}
   druids
end

.get_errored_objects_for_workstep(workflow, step, tag = '') ⇒ hash

Get a list of druids that have errored out in a particular workflow and step

e.g. result=Assembly::Utils.get_errored_objects_for_workstep(‘accessionWF’,‘content-metadata’,‘Project : Revs’)

> - Item error; caused by #<Rubydora::FedoraInvalidRequest: Error modifying datastream contentMetadata for druid:qd556jq0580. See logger for details>”

Parameters:

  • workflow (string)

    name

  • step (string)

    name

  • tag (string) (defaults to: '')

    – optional, if supplied, results will be filtered by the exact tag supplied; note this will dramatically slow down the response if there are many results

Returns:

  • (hash)

    hash of results, with key has a druid, and value as the error message



608
609
610
611
612
613
614
615
616
617
618
619
620
# File 'lib/assembly-utils/utils.rb', line 608

def self.get_errored_objects_for_workstep(workflow, step, tag = '')
  result = Dor::Config.workflow.client.get_errored_objects_for_workstep workflow, step, 'dor'
  return result if tag == ''
  filtered_result = {}
  result.each do |druid, error|
    begin
      item = Dor::Item.find(druid)
      filtered_result.merge!(druid => error) if item.tags.include? tag
    rescue
    end
  end
  filtered_result
end

.get_staging_path(pid, base_path = nil) ⇒ string

Get the staging directory tree given a druid, and optionally prepend a basepath. Deprecated and should not be needed anymore.

Example:

puts Assembly::Utils.get_staging_path('aa000aa0001','tmp')
> "tmp/aa/000/aa/0001"

Parameters:

  • pid (String)

    druid pid (e.g. ‘aa000aa0001’)

  • base_path (String) (defaults to: nil)

    optional base path to prepend to druid path

Returns:

  • (string)

    path to material that is being staged, with optional prepended base path



26
27
28
29
# File 'lib/assembly-utils/utils.rb', line 26

def self.get_staging_path(pid, base_path = nil)
  d = DruidTools::Druid.new(pid, base_path)
  File.dirname(d.path)
end

.get_workflow_status(druid, workflow, step) ⇒ string

Show the workflow status of a specific step in a specific workflow for the provided druid. This method only works when this gem is used in a project that is configured to connect to DOR

Example:

puts Assembly::Utils.get_workflow_status('druid:aa000aa0001','assemblyWF','jp2-create')
> "completed"

Parameters:

  • druid (string)

    a druid string

  • workflow (string)

    name of workflow

  • step (string)

    name of step

Returns:

  • (string)

    workflow step status, returns nil if no workflow found



163
164
165
# File 'lib/assembly-utils/utils.rb', line 163

def self.get_workflow_status(druid, workflow, step)
  Dor::Config.workflow.client.get_workflow_status('dor', druid, workflow, step)
end

.import_objects(source_dir) ⇒ Object

Import all of the FOXML files in the specified directory into Fedora

Example:

Assembly::Utils.import_objects('/tmp')

Parameters:

  • source_dir (String)
    • the full path to import the foxml files



83
84
85
86
87
88
89
90
# File 'lib/assembly-utils/utils.rb', line 83

def self.import_objects(source_dir)
  Dir.chdir(source_dir)
  files = Dir.glob('*.foxml.xml')
  files.each do |file|
    pid = ActiveFedora::FixtureLoader.import_to_fedora(File.join(source_dir, file))
    ActiveFedora::FixtureLoader.index(pid)
  end
end

.in_accessioning?(pid) ⇒ boolean

Check if the object is currently in accessioning This method only works when this gem is used in a project that is configured to connect to the workflow service.

Example:

Assembly::Utils.in_accessioning?('druid:oo000oo0001')
> false

Parameters:

  • pid (string)

    the druid to operate on

Returns:

  • (boolean)

    if object is currently in accessioning



495
496
497
# File 'lib/assembly-utils/utils.rb', line 495

def self.in_accessioning?(pid)
  Dor::Config.workflow.client.get_active_lifecycle(REPO, pid, 'submitted') ? true : false
end

.ingest_hold?(pid) ⇒ boolean

Check if the object is on ingest hold This method only works when this gem is used in a project that is configured to connect to the workflow service.

Example:

Assembly::Utils.ingest_hold?('druid:oo000oo0001')
> false

Parameters:

  • pid (string)

    the druid to operate on

Returns:

  • (boolean)

    if object is on ingest hold



507
508
509
# File 'lib/assembly-utils/utils.rb', line 507

def self.ingest_hold?(pid)
  Dor::Config.workflow.client.get_workflow_status(REPO, pid, 'accessionWF', 'sdr-ingest-transfer') == 'hold'
end

.insert_workflow(pid, workflow, repo = 'dor') ⇒ boolean

Insert the specified workflow into the specified object.

Example:

puts Assembly::Utils.insert_workflow('druid:aa000aa0001','accessionWF')
> true

Parameters:

  • pid (String)

    druid pid (e.g. ‘aa000aa0001’)

  • workflow (String)

    name (e.g. ‘accessionWF’)

  • repository (String)

    name (e.g. ‘dor’) – optional, defaults to dor

Returns:

  • (boolean)

    indicates success of web service call



42
43
44
45
46
# File 'lib/assembly-utils/utils.rb', line 42

def self.insert_workflow(pid, workflow, repo = 'dor')
  url = "#{Dor::Config.dor.service_root}/objects/#{pid}/apo_workflows/#{workflow}"
  result = RestClient.post url, {}
  [200, 201, 202, 204].include?(result.code) && result
end

.is_apo?(druid) ⇒ boolean

Determines if the specifed object is an APO Example:

Assembly::Utils.is_apo?('druid:oo000oo0001')

> true

Parameters:

  • druid (string)
    • the druid of the APO to check

Returns:

  • (boolean)

    if object exist and is an APO



375
376
377
378
379
380
# File 'lib/assembly-utils/utils.rb', line 375

def self.is_apo?(druid)
  obj = Dor::Item.find(druid)
  obj..objectType.first == 'adminPolicy'
rescue
  return false
end

.is_ingested?(pid) ⇒ boolean

Check if the object is fully accessioned and ingested. This method only works when this gem is used in a project that is configured to connect to the workflow service.

Example:

Assembly::Utils.is_ingested?('druid:oo000oo0001')
> false

Parameters:

  • pid (string)

    the druid to operate on

Returns:

  • (boolean)

    if object is fully ingested



483
484
485
# File 'lib/assembly-utils/utils.rb', line 483

def self.is_ingested?(pid)
  Dor::Config.workflow.client.get_lifecycle(REPO, pid, 'accessioned') ? true : false
end

.is_submitted?(pid) ⇒ boolean

Check if the object is submitted This method only works when this gem is used in a project that is configured to connect to the workflow service.

Example:

Assembly::Utils.('druid:oo000oo0001')
> false

Parameters:

  • pid (string)

    the druid to operate on

Returns:

  • (boolean)

    if object is submitted



519
520
521
# File 'lib/assembly-utils/utils.rb', line 519

def self.(pid)
  Dor::Config.workflow.client.get_lifecycle(REPO, pid, 'submitted').nil?
end

.load_config(filename) ⇒ hash

Read in a YAML configuration file from disk and return a hash

Example:

config_filename='/thumpers/dpgthumper2-smpl/SC1017_SOHP/sohp_prod_accession.yaml'
config=Assembly::Utils.load_config(config_filename)
puts config['progress_log_file']
> "/dor/preassembly/sohp_accession_log.yaml"

Parameters:

  • filename (string)

    of YAML config file to read

Returns:

  • (hash)

    configuration contents as a hash



669
670
671
# File 'lib/assembly-utils/utils.rb', line 669

def self.load_config(filename)
  YAML.load(Assembly::Utils.read_file(filename))
end

.read_druids_from_file(csv_filename) ⇒ array

Get a list of druids from a CSV file which has a heading of “druid” and put them into a Ruby array. Useful if you want to import a report from argo

Example:

Assembly::Utils.read_druids_from_file('download.csv') # ['druid:xxxxx', 'druid:yyyyy']

Parameters:

  • filename (string)

    of CSV that has a column called “druid”

Returns:

  • (array)

    array of druids



587
588
589
590
591
592
593
594
595
596
# File 'lib/assembly-utils/utils.rb', line 587

def self.read_druids_from_file(csv_filename)
  return to_enum(:read_druids_from_file, csv_filename) unless block_given?

  CSV.foreach(csv_filename, :headers => true) do |row|
    druid = row['druid']
    druid = "druid:#{druid}" unless druid.include?('druid:')

    yield druid
  end
end

.read_file(filename) ⇒ string

Read in a file from disk

Parameters:

  • filename (string)

    to read

Returns:

  • (string)

    file contents as a string



677
678
679
# File 'lib/assembly-utils/utils.rb', line 677

def self.read_file(filename)
  File.readable?(filename) ? IO.read(filename) : ''
end

.reindex(pid) ⇒ Object

Reindex the supplied PID in solr.

e.g. Assembly::Utils.reindex(‘druid:oo000oo0001’)

Parameters:

  • pid (string)

    of druid



449
450
451
452
453
# File 'lib/assembly-utils/utils.rb', line 449

def self.reindex(pid)
  obj = Dor.load_instance pid
  solr_doc = obj.to_solr
  Dor::SearchService.solr.add(solr_doc, :add_attributes => {:commitWithin => 1000}) unless obj.nil?
end

.remove_duplicate_tags(druids) ⇒ Object

Removes any duplicate tags within each druid

Parameters:

  • druids (array)
    • an array of druids



766
767
768
769
770
771
772
773
774
775
776
777
778
# File 'lib/assembly-utils/utils.rb', line 766

def self.remove_duplicate_tags(druids)
  druids.each do |druid|
    i = Dor::Item.find(druid)
    next unless i && i.tags.size > 1 # multiple tags
    i.tags.each do |tag|
      next unless (i.tags.select {|t| t == tag}).size > 1 # tag is duplicate
      i.remove_tag(tag)
      i.add_tag(tag)
      puts "Saving #{druid} to remove duplicate tag='#{tag}'"
      i.save
    end
  end
end

.replace_datastreams(druids, datastream_name, new_content, publish = false) ⇒ Object

Replace a specific datastream for a series of objects in DOR with new content

Example:

druids=%w{druid:aa111aa1111 druid:bb222bb2222}
new_content='<xml><more nodes>this should be the whole datastream</more nodes></xml>'
datastream='rightsMetadata'
Assembly::Utils.replace_datastreams(druids,datastream,new_content)

Parameters:

  • druids (array)
    • an array of druids

  • datastream_name (string)
    • the name of the datastream to replace

  • new_content (string)
    • the new content to replace the entire datastream with

  • publish (boolean) (defaults to: false)
    • defaults to false, if true, will publish each object after replacing datastreams (must be run on server with rights to do this)



316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
# File 'lib/assembly-utils/utils.rb', line 316

def self.replace_datastreams(druids, datastream_name, new_content, publish = false)
  druids.each do |druid|
    obj = Dor::Item.find(druid)
    ds = obj.datastreams[datastream_name]
    if ds
      ds.content = new_content
      ds.save
      puts "replaced #{datastream_name} for #{druid}"
      if publish
        obj.
        puts '--object re-published'
      end
    else
      puts "#{datastream_name} does not exist for #{druid}"
    end
  end
end

.republish(druids) ⇒ Object

Republish a list of druids. Only works when run from a server with access rights to the stacks (e.g. lyberservices-prod)

Example:

druids=%w{druid:aa111aa1111 druid:bb222bb2222}
Assembly::Utils.republish(druids)

Parameters:

  • druids (array)
    • an array of druids



341
342
343
344
345
346
347
# File 'lib/assembly-utils/utils.rb', line 341

def self.republish(druids)
  druids.each do |druid|
    obj = Dor::Item.find(druid)
    obj.
    puts "republished #{druid}"
  end
end

.reset_errored_objects_for_workstep(workflow, step, tag = '') ⇒ hash

Reset any objects in a specific workflow step and state that have errored out back to waiting

e.g. result = Assembly::Utils.reset_errored_objects_for_workstep(‘accessionWF’, ‘content-metadata’)

> - Item error; caused by #<Rubydora::FedoraInvalidRequest: Error modifying datastream contentMetadata for druid:qd556jq0580. See logger for details>”

Parameters:

  • workflow (string)

    name

  • step (string)

    name

  • tag (string) (defaults to: '')

    – optional, if supplied, results will be filtered by the exact tag supplied; note this will dramatically slow down the response if there are many results

Returns:

  • (hash)

    hash of results that have been reset, with key has a druid, and value as the error message



632
633
634
635
636
637
638
# File 'lib/assembly-utils/utils.rb', line 632

def self.reset_errored_objects_for_workstep(workflow, step, tag = '')
  result = get_errored_objects_for_workstep workflow, step, tag
  druids = []
  result.each {|k, v| druids << k}
  reset_workflow_states(:druids => druids, :steps => {workflow => [step]}) if druids.size > 0
  result
end

.reset_workflow_states(params = {}) ⇒ Object

Reset the workflow states for a list of druids given a list of workflow names and steps. Provide a list of druids in an array, and a hash containing workflow names (e.g. ‘assemblyWF’ or ‘accessionWF’) as the keys, and arrays of steps as the corresponding values (e.g. [‘checksum-compute’,‘jp2-create’]) and they will all be reset to “waiting”. This method only works when this gem is used in a project that is configured to connect to DOR

Example:

druids = ['druid:aa111aa1111', 'druid:bb222bb2222']
steps = {'assemblyWF' => ['checksum-compute'], 'accessionWF' => ['content-metadata', 'descriptive-metadata']}
Assembly::Utils.reset_workflow_states(:druids => druids, :steps => steps)

Parameters:

  • params (Hash) (defaults to: {})

    parameters specified as a hash, using symbols for options:

    • :druids => array of druids

    • :steps => a hash, containing workflow names as keys, and an array of steps

    • :state => a string for the name of the state to reset to, defaults to ‘waiting’ (could be ‘completed’ for example)



561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
# File 'lib/assembly-utils/utils.rb', line 561

def self.reset_workflow_states(params = {})
  druids    = params[:druids] || []
  workflows = params[:steps]  || {}
  state     = params[:state]  || 'waiting'
  druids.each do |druid|
    puts "** #{druid}"
    begin
      workflows.each do |workflow, steps|
        steps.each do |step|
          puts "Updating #{workflow}:#{step} to #{state}"
          Dor::Config.workflow.client.update_workflow_status 'dor', druid, workflow, step, state
        end
      end
    rescue Exception => e
      puts "an error occurred trying to update workflows for #{druid} with message #{e.message}"
    end
  end
end

.set_workflow_step_to_error(pid, step) ⇒ Object

Set the workflow step for the given PID to an error state

Parameters:

  • pid (string)

    of druid

  • step (string)

    to set to error



425
426
427
428
429
430
431
# File 'lib/assembly-utils/utils.rb', line 425

def self.set_workflow_step_to_error(pid, step)
  wf_name = Assembly::ASSEMBLY_WF
  msg     = 'Integration testing'
  params  = ['dor', pid, wf_name, step, msg]
  resp    = Dor::Config.workflow.client.update_workflow_error_status *params
  raise 'update_workflow_error_status() returned false.' unless resp == true
end

.solr_doc_parser(doc, check_status_in_dor = false) ⇒ string

Used by the completion_report and project_tag_report in the pre-assembly project

Parameters:

  • doc (solr_document)

    a solr document result

  • check_status_in_dor (boolean) (defaults to: false)

    indicates if we should check for the workflow states in dor or trust SOLR is up to date (defaults to false)

Returns:

  • (string)

    a comma delimited row for the report



687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
# File 'lib/assembly-utils/utils.rb', line 687

def self.solr_doc_parser(doc, check_status_in_dor = false)
  druid = doc[:id]

  if Solrizer::VERSION < '3.0'
    label = doc[:objectLabel_t]
    title = doc[:public_dc_title_t].nil? ? '' : doc[:public_dc_title_t].first

    if check_status_in_dor
      accessioned = get_workflow_status(druid, 'accessionWF', 'publish') == 'completed'
      shelved     = get_workflow_status(druid, 'accessionWF', 'shelve')  == 'completed'
    else
      accessioned = doc[:wf_wps_facet].nil? ? false : doc[:wf_wps_facet].include?('accessionWF:publish:completed')
      shelved     = doc[:wf_wps_facet].nil? ? false : doc[:wf_wps_facet].include?('accessionWF:shelve:completed')
    end
    source_id = doc[:source_id_t]
    files     = doc[:content_file_t]
  else
    label = doc[Solrizer.solr_name('objectLabel', :displayable)]
    title = doc.fetch(Solrizer.solr_name('public_dc_title', :displayable), []).first || ''

    if check_status_in_dor
      accessioned = get_workflow_status(druid, 'accessionWF', 'publish') == 'completed'
      shelved     = get_workflow_status(druid, 'accessionWF', 'shelve')  == 'completed'
    else
      accessioned = doc.fetch(Solrizer.solr_name('wf_wps', :symbol), []).include?('accessionWF:publish:completed')
      shelved     = doc.fetch(Solrizer.solr_name('wf_wps', :symbol), []).include?('accessionWF:shelve:completed')
    end
    source_id = doc[Solrizer.solr_name('source_id', :symbol)]
    files     = doc[Solrizer.solr_name('content_file', :symbol)]
  end

  if files.nil?
    file_type_list = ''
    num_files = 0
  else
    num_files = files.size
    # count the amount of each file type
    file_types = Hash.new(0)
    unless num_files == 0
      files.each {|file| file_types[File.extname(file)] += 1}
      file_type_list = file_types.map{|k, v| "#{k}=#{v}"}.join(' | ')
    end
  end

  val = druid.split(/:/).last
  purl_link = File.join(Assembly::PURL_BASE_URL, val)
  [druid, label, title, source_id, accessioned, shelved, purl_link, num_files, file_type_list]
end

.symbolize_keys(h) ⇒ hash

Takes a hash data structure and recursively converts all hash keys from strings to symbols.

Example:

Assembly::Utils.symbolize_keys({'dude' => 'is cool', 'i' => 'am too'})
> {:dude => "is cool", :i => "am too"}

Parameters:

  • h (hash)

    hash

Returns:

  • (hash)

    a hash with all keys converted from strings to symbols



744
745
746
747
748
749
750
751
752
# File 'lib/assembly-utils/utils.rb', line 744

def self.symbolize_keys(h)
  if h.instance_of? Hash
    h.inject({}) { |hh, (k, v)| hh[k.to_sym] = symbolize_keys(v); hh }
  elsif h.instance_of? Array
    h.map { |v| symbolize_keys(v) }
  else
    h
  end
end

.unregister(pid) ⇒ boolean

Unregister a DOR object, which includes deleting it and deleting all its workflows

Parameters:

  • pid (string)

    of druid

Returns:

  • (boolean)

    if deletion succeed or not



413
414
415
416
417
418
419
# File 'lib/assembly-utils/utils.rb', line 413

def self.unregister(pid)
  Assembly::Utils.delete_all_workflows pid
  Assembly::Utils.delete_from_dor pid
  true
rescue
  return false
end

.update_datastreams(druids, datastream_name, find_content, replace_content) ⇒ Object

Update a specific datastream for a series of objects in DOR by searching and replacing content

Example:

druids = %w{druid:aa111aa1111 druid:bb222bb2222}
find_content = 'FooBarBaz'
replace_content = 'Stanford Rules'
datastream = 'rightsMetadata'
Assembly::Utils.update_datastreams(druids, datastream, find_content, replace_content)

Parameters:

  • druids (array)
    • an array of druids

  • datastream_name (string)
    • the name of the datastream to replace

  • find_content (string)
    • the content to find

  • replace_content (string)
    • the content to replace the found content with



395
396
397
398
399
400
401
402
403
404
405
406
407
408
# File 'lib/assembly-utils/utils.rb', line 395

def self.update_datastreams(druids, datastream_name, find_content, replace_content)
  druids.each do |druid|
    obj = Dor::Item.find(druid)
    ds = obj.datastreams[datastream_name]
    if ds
      updated_content = ds.content.gsub(find_content, replace_content)
      ds.content = updated_content
      ds.save
      puts "updated #{datastream_name} for #{druid}"
    else
      puts "#{datastream_name} does not exist for #{druid}"
    end
  end
end

.update_rights_metadata(druids, apo_druid, publish = false) ⇒ Object

Quicky update rights metadata for any existing list of objects using default rights metadata pulled from the supplied APO

Example:

druids=%w{druid:aa111aa1111 druid:bb222bb2222}
apo_druid='druid:cc222cc2222'
Assembly::Utils.(druids,apo_druid)

Parameters:

  • druids (array)
    • an array of druids

  • apo_druid (string)
    • the druid of the APO to pull rights metadata from

  • publish (boolean) (defaults to: false)
    • defaults to false, if true, will publish each object after replacing datastreams (must be run on server with rights to do this)



298
299
300
301
302
# File 'lib/assembly-utils/utils.rb', line 298

def self.(druids, apo_druid, publish = false)
  apo = Dor::Item.find(apo_druid)
  rights_md = apo.datastreams['defaultObjectRights']
  replace_datastreams(druids, 'rightsMetadata', rights_md.content, publish)
end

.updates_allowed?(pid) ⇒ boolean

Check if the updates are allowed on the object This method only works when this gem is used in a project that is configured to connect to the workflow service.

Example:

Assembly::Utils.updates_allowed?('druid:oo000oo0001')
> false

Parameters:

  • pid (string)

    the druid to operate on

Returns:

  • (boolean)

    if object can be versioned and updated



531
532
533
# File 'lib/assembly-utils/utils.rb', line 531

def self.updates_allowed?(pid)
  !self.in_accessioning?(pid) && self.is_ingested?(pid)
end

.values_to_symbols!(h) ⇒ hash

Takes a hash and converts its string values to symbols – not recursively. Example:

Assembly::Utils.values_to_symbols!({'dude' => 'iscool', 'i' => 'amtoo'})
> {'i' => :amtoo, 'dude' => :iscool}

Parameters:

  • h (hash)

    hash

Returns:

  • (hash)

    a hash with all values converted from strings to symbols



760
761
762
# File 'lib/assembly-utils/utils.rb', line 760

def self.values_to_symbols!(h)
  h.each { |k, v| h[k] = v.to_sym if v.class == String }
end

.versioning_required?(pid) ⇒ boolean

Check if versioning is required for the object This method only works when this gem is used in a project that is configured to connect to the workflow service.

Example:

Assembly::Utils.versioning_required?('druid:oo000oo0001')
> false

Parameters:

  • pid (string)

    the druid to operate on

Returns:

  • (boolean)

    if object requires versioning



543
544
545
# File 'lib/assembly-utils/utils.rb', line 543

def self.versioning_required?(pid)
  !((!self.is_ingested?(pid) && self.ingest_hold?(pid)) || (!self.is_ingested?(pid) && !self.(pid)))
end

.workflow_status(params = {}) ⇒ string

Show the workflow status of specific steps in assembly and/or accession workflows for the provided druids. This method only works when this gem is used in a project that is configured to connect to DOR

Example:

Assembly::Utils.workflow_status(:druids=>['druid:aa000aa0001','druid:aa000aa0002'],:workflows=>[:assembly,:accession],:filename=>'output.csv')

Parameters:

  • params (Hash) (defaults to: {})

    parameters specified as a hash, using symbols for options:

    • :druids => array of druids to get workflow status for

    • :workflows => an optional array of workflow names as symbols, options are :assembly and :accession; defaults to :assembly

    • :filename => optional filename if you want to send output to a CSV

Returns:

  • (string)

    comma delimited output or CSV file



118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
# File 'lib/assembly-utils/utils.rb', line 118

def self.workflow_status(params = {})

  druids    = params[:druids] || []
  workflows = params[:workflows] || [:assembly]
  filename  = params[:filename] || ''
  accession_steps = %w(content-metadata descriptive-metadata rights-metadata remediate-object shelve publish)
  assembly_steps  = %w(jp2-create checksum-compute exif-collect accessioning-initiate)

  puts 'Generating report'

  csv = CSV.open(filename, 'w') if filename != ''

  header = ['druid']
  header << assembly_steps  if workflows.include?(:assembly)
  header << accession_steps if workflows.include?(:accession)
  csv << header.flatten if filename != ''
  puts header.join(',')

  druids.each do |druid|
    output = [druid]
    assembly_steps.each  {|step| output << get_workflow_status(druid, 'assemblyWF', step )} if workflows.include?(:assembly)
    accession_steps.each {|step| output << get_workflow_status(druid, 'accessionWF', step)} if workflows.include?(:accession)
    csv << output if filename != ''
    puts output.join(',')
  end

  if filename != ''
    csv.close
    puts "Report generated in #{filename}"
  end

end