Class: Bulkrax::ApplicationParser Abstract

Inherits:
Object
  • Object
show all
Defined in:
app/parsers/bulkrax/application_parser.rb

Overview

This class is abstract.

Subclass the Bulkrax::ApplicationParser to create a parser that handles a specific format (e.g. CSV, Bagit, XML, etc).

An abstract class that establishes the API for Bulkrax’s import and export parsing.

Direct Known Subclasses

CsvParser, OaiDcParser, XmlParser

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(importerexporter) ⇒ ApplicationParser

Returns a new instance of ApplicationParser.



37
38
39
40
# File 'app/parsers/bulkrax/application_parser.rb', line 37

def initialize(importerexporter)
  @importerexporter = importerexporter
  @headers = []
end

Instance Attribute Details

#headersObject

rubocop:disable Metrics/ClassLength



8
9
10
# File 'app/parsers/bulkrax/application_parser.rb', line 8

def headers
  @headers
end

#importerexporterObject Also known as: importer, exporter

rubocop:disable Metrics/ClassLength



8
9
10
# File 'app/parsers/bulkrax/application_parser.rb', line 8

def importerexporter
  @importerexporter
end

Class Method Details

.export_supported?TrueClass, FalseClass

TODO:

Convert to ‘class_attribute :export_supported, default: false, instance_predicate: true` and `self << class; alias export_supported? export_supported; end`

Returns this parser does or does not support exports.

Returns:

  • (TrueClass, FalseClass)

    this parser does or does not support exports.



26
27
28
# File 'app/parsers/bulkrax/application_parser.rb', line 26

def self.export_supported?
  false
end

.import_supported?TrueClass, FalseClass

TODO:

Convert to ‘class_attribute :import_supported, default: false, instance_predicate: true` and `self << class; alias import_supported? import_supported; end`

Returns this parser does or does not support imports.

Returns:

  • (TrueClass, FalseClass)

    this parser does or does not support imports.



33
34
35
# File 'app/parsers/bulkrax/application_parser.rb', line 33

def self.import_supported?
  true
end

.parser_fieldsObject

TODO:

Convert to ‘class_attribute :parser_fiels, default: {}`



19
20
21
# File 'app/parsers/bulkrax/application_parser.rb', line 19

def self.parser_fields
  {}
end

Instance Method Details

#base_path(type = 'import') ⇒ String

Base path for imported and exported files

Parameters:

  • (String)

Returns:

  • (String)

    the base path for files that this parser will “parse”



294
295
296
297
298
# File 'app/parsers/bulkrax/application_parser.rb', line 294

def base_path(type = 'import')
  # account for multiple versions of hyku
  is_multitenant = ENV['HYKU_MULTITENANT'] == 'true' || ENV['SETTINGS__MULTITENANCY__ENABLED'] == 'true'
  is_multitenant ? File.join(Bulkrax.send("#{type}_path"), ::Site.instance..name) : Bulkrax.send("#{type}_path")
end

#calculate_type_delay(type) ⇒ Object



240
241
242
243
244
# File 'app/parsers/bulkrax/application_parser.rb', line 240

def calculate_type_delay(type)
  return 2.minutes if type == 'file_set'
  return 1.minute if type == 'work'
  return 0
end

#collection_entry_classObject

This method is abstract.

Subclass and override #collection_entry_class to implement behavior for the parser.

Raises:

  • (NotImplementedError)


54
55
56
# File 'app/parsers/bulkrax/application_parser.rb', line 54

def collection_entry_class
  raise NotImplementedError, 'must be defined'
end

#collections_totalObject



420
421
422
# File 'app/parsers/bulkrax/application_parser.rb', line 420

def collections_total
  0
end

#create_collectionsObject



162
163
164
# File 'app/parsers/bulkrax/application_parser.rb', line 162

def create_collections
  create_objects(['collection'])
end

#create_entry_and_job(current_record, type, identifier = nil) ⇒ Object



260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
# File 'app/parsers/bulkrax/application_parser.rb', line 260

def create_entry_and_job(current_record, type, identifier = nil)
  identifier ||= current_record[source_identifier]
  new_entry = find_or_create_entry(send("#{type}_entry_class"),
                                   identifier,
                                   'Bulkrax::Importer',
                                   (current_record))
  new_entry.status_info('Pending', importer.current_run)
  if record_deleted?(current_record)
    "Bulkrax::Delete#{type.camelize}Job".constantize.send(perform_method, new_entry, current_run)
  elsif record_remove_and_rerun?(current_record) || remove_and_rerun
    delay = calculate_type_delay(type)
    "Bulkrax::DeleteAndImport#{type.camelize}Job".constantize.set(wait: delay).send(perform_method, new_entry, current_run)
  else
    "Bulkrax::Import#{type.camelize}Job".constantize.send(perform_method, new_entry.id, current_run.id)
  end
end

#create_file_setsObject



170
171
172
# File 'app/parsers/bulkrax/application_parser.rb', line 170

def create_file_sets
  create_objects(['file_set'])
end

#create_objects(types_array = nil) ⇒ Object

Parameters:

  • types (Array<Symbol>)

    the types of objects that we’ll create.

See Also:



187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
# File 'app/parsers/bulkrax/application_parser.rb', line 187

def create_objects(types_array = nil)
  index = 0
  (types_array || %w[collection work file_set relationship]).each do |type|
    if type.eql?('relationship')
      ScheduleRelationshipsJob.set(wait: 5.minutes).perform_later(importer_id: importerexporter.id)
      next
    end
    send(type.pluralize).each do |current_record|
      next unless record_has_source_identifier(current_record, index)
      break if limit_reached?(limit, index)
      seen[current_record[source_identifier]] = true
      create_entry_and_job(current_record, type)
      increment_counters(index, "#{type}": true)
      index += 1
    end
    importer.record_status
  end
  true
rescue StandardError => e
  set_status_info(e)
end

#create_relationshipsObject



174
175
176
# File 'app/parsers/bulkrax/application_parser.rb', line 174

def create_relationships
  create_objects(['relationship'])
end

#create_worksObject



166
167
168
# File 'app/parsers/bulkrax/application_parser.rb', line 166

def create_works
  create_objects(['work'])
end

#entry_classObject

This method is abstract.

Subclass and override #entry_class to implement behavior for the parser.

Raises:

  • (NotImplementedError)


44
45
46
# File 'app/parsers/bulkrax/application_parser.rb', line 44

def entry_class
  raise NotImplementedError, 'must be defined'
end

#exporter?TrueClass, FalseClass

Returns:

  • (TrueClass, FalseClass)


325
326
327
# File 'app/parsers/bulkrax/application_parser.rb', line 325

def exporter?
  importerexporter.is_a?(Bulkrax::Exporter)
end

#file_set_entry_classObject

This method is abstract.

Subclass and override #file_set_entry_class to implement behavior for the parser.

Raises:

  • (NotImplementedError)


60
61
62
# File 'app/parsers/bulkrax/application_parser.rb', line 60

def file_set_entry_class
  raise NotImplementedError, 'must be defined'
end

#file_sets_totalObject



424
425
426
# File 'app/parsers/bulkrax/application_parser.rb', line 424

def file_sets_total
  0
end

#find_or_create_entry(entryclass, identifier, type, raw_metadata = nil) ⇒ Object



389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
# File 'app/parsers/bulkrax/application_parser.rb', line 389

def find_or_create_entry(entryclass, identifier, type,  = nil)
  # limit entry search to just this importer or exporter. Don't go moving them
  entry = importerexporter.entries.where(
    identifier: identifier
  ).first
  entry ||= entryclass.new(
    importerexporter_id: importerexporter.id,
    importerexporter_type: type,
    identifier: identifier
  )
  entry. = 
  # Setting parsed_metadata specifically for the id so we can find the object via the
  # id in a delete.  This is likely to get clobbered in a regular import, which is fine.
  entry. = { id: ['id'] } if &.key?('id')
  entry.save!
  entry
end

#generated_metadata_mappingString

Returns:

  • (String)


94
95
96
# File 'app/parsers/bulkrax/application_parser.rb', line 94

def 
  @generated_metadata_mapping ||= 'generated'
end

#get_field_mapping_hash_for(key) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Raises:

  • (StandardError)


123
124
125
126
127
128
129
130
131
132
133
134
# File 'app/parsers/bulkrax/application_parser.rb', line 123

def get_field_mapping_hash_for(key)
  return instance_variable_get("@#{key}_hash") if instance_variable_get("@#{key}_hash").present?

  mapping = importerexporter.field_mapping.is_a?(Hash) ? importerexporter.field_mapping : {}
  instance_variable_set(
    "@#{key}_hash",
    mapping&.with_indifferent_access&.select { |_, h| h.key?(key) }
  )
  raise StandardError, "more than one #{key} declared: #{instance_variable_get("@#{key}_hash").keys.join(', ')}" if instance_variable_get("@#{key}_hash").length > 1

  instance_variable_get("@#{key}_hash")
end

#import_file_pathString

Path for the import

Returns:

  • (String)


484
485
486
# File 'app/parsers/bulkrax/application_parser.rb', line 484

def import_file_path
  @import_file_path ||= real_import_file_path
end

#importer?TrueClass, FalseClass

Returns:

  • (TrueClass, FalseClass)


320
321
322
# File 'app/parsers/bulkrax/application_parser.rb', line 320

def importer?
  importerexporter.is_a?(Bulkrax::Importer)
end

#invalid_record(message) ⇒ Object

rubocop:disable Rails/SkipsModelValidations



358
359
360
361
362
363
364
# File 'app/parsers/bulkrax/application_parser.rb', line 358

def invalid_record(message)
  current_run.invalid_records ||= ""
  current_run.invalid_records += message
  current_run.save
  ImporterRun.increment_counter(:failed_records, current_run.id)
  ImporterRun.decrement_counter(:enqueued_records, current_run.id) unless ImporterRun.find(current_run.id).enqueued_records <= 0 # rubocop:disable Style/IdenticalConditionalBranches
end

#limit_reached?(limit, index) ⇒ TrueClass, FalseClass

Parameters:

  • limit (Integer)

    limit set on the importerexporter

  • index (Integer)

    index of current iteration

Returns:

  • (TrueClass, FalseClass)


332
333
334
335
# File 'app/parsers/bulkrax/application_parser.rb', line 332

def limit_reached?(limit, index)
  return false if limit.nil? || limit.zero? # no limit
  index >= limit
end

#model_field_mappingsArray<String>

Returns:

  • (Array<String>)


137
138
139
140
141
142
# File 'app/parsers/bulkrax/application_parser.rb', line 137

def model_field_mappings
  model_mappings = Bulkrax.field_mappings[self.class.to_s]&.dig('model', :from) || []
  model_mappings |= ['model']

  model_mappings
end

#new_entry(entryclass, type) ⇒ Object



382
383
384
385
386
387
# File 'app/parsers/bulkrax/application_parser.rb', line 382

def new_entry(entryclass, type)
  entryclass.new(
    importerexporter_id: importerexporter.id,
    importerexporter_type: type
  )
end

#path_for_importString

Path where we’ll store the import metadata and files

this is used for uploaded and cloud files

Returns:

  • (String)


303
304
305
306
307
# File 'app/parsers/bulkrax/application_parser.rb', line 303

def path_for_import
  @path_for_import = File.join(base_path, importerexporter.path_string)
  FileUtils.mkdir_p(@path_for_import) unless File.exist?(@path_for_import)
  @path_for_import
end

#perform_methodString

Returns:

  • (String)


145
146
147
148
149
150
151
# File 'app/parsers/bulkrax/application_parser.rb', line 145

def perform_method
  if self.validate_only
    'perform_now'
  else
    'perform_later'
  end
end

#rebuild_entries(types_array = nil) ⇒ Object



209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
# File 'app/parsers/bulkrax/application_parser.rb', line 209

def rebuild_entries(types_array = nil)
  index = 0
  (types_array || %w[collection work file_set relationship]).each do |type|
    # works are not guaranteed to have Work in the type
    if type.eql?('relationship')
      ScheduleRelationshipsJob.set(wait: 5.minutes).perform_later(importer_id: importerexporter.id)
      next
    end
    importer.entries.where(rebuild_entry_query(type, parser_fields['entry_statuses'])).find_each do |e|
      seen[e.identifier] = true
      e.status_info('Pending', importer.current_run)
      if remove_and_rerun
        delay = calculate_type_delay(type)
        "Bulkrax::DeleteAndImport#{type.camelize}Job".constantize.set(wait: delay).send(perform_method, e, current_run)
      else
        "Bulkrax::Import#{type.camelize}Job".constantize.send(perform_method, e.id, current_run.id)
      end
      increment_counters(index)
      index += 1
    end
  end
end

#rebuild_entry_query(type, statuses) ⇒ Object



232
233
234
235
236
237
238
# File 'app/parsers/bulkrax/application_parser.rb', line 232

def rebuild_entry_query(type, statuses)
  type_col = Bulkrax::Entry.arel_table['type']
  status_col = Bulkrax::Entry.arel_table['status_message']

  query = (type == 'work' ? type_col.does_not_match_all(%w[collection file_set]) : type_col.matches(type.camelize))
  query.and(status_col.in(statuses))
end

#record(identifier, _opts = {}) ⇒ Object

TODO:
  • review this method - is it ever used?



408
409
410
411
412
413
414
# File 'app/parsers/bulkrax/application_parser.rb', line 408

def record(identifier, _opts = {})
  return @record if @record

  @record = entry_class.new(self, identifier)
  @record.build
  return @record
end

#record_deleted?(record) ⇒ Boolean

Returns:

  • (Boolean)


250
251
252
253
# File 'app/parsers/bulkrax/application_parser.rb', line 250

def record_deleted?(record)
  return false unless record.key?(:delete)
  ActiveModel::Type::Boolean.new.cast(record[:delete])
end

#record_has_source_identifier(record, index) ⇒ TrueClass, FalseClass

Returns:

  • (TrueClass, FalseClass)


344
345
346
347
348
349
350
351
352
353
354
355
# File 'app/parsers/bulkrax/application_parser.rb', line 344

def record_has_source_identifier(record, index)
  if record[source_identifier].blank?
    if Bulkrax.fill_in_blank_source_identifiers.present?
      record[source_identifier] = Bulkrax.fill_in_blank_source_identifiers.call(self, index)
    else
      invalid_record("Missing #{source_identifier} for #{record.to_h}\n")
      false
    end
  else
    true
  end
end

#record_raw_metadata(record) ⇒ Object



246
247
248
# File 'app/parsers/bulkrax/application_parser.rb', line 246

def (record)
  record.to_h
end

#record_remove_and_rerun?(record) ⇒ Boolean

Returns:

  • (Boolean)


255
256
257
258
# File 'app/parsers/bulkrax/application_parser.rb', line 255

def record_remove_and_rerun?(record)
  return false unless record.key?(:remove_and_rerun)
  ActiveModel::Type::Boolean.new.cast(record[:remove_and_rerun])
end

#records(_opts = {}) ⇒ Object

This method is abstract.

Subclass and override #records to implement behavior for the parser.

Raises:

  • (NotImplementedError)


66
67
68
# File 'app/parsers/bulkrax/application_parser.rb', line 66

def records(_opts = {})
  raise NotImplementedError, 'must be defined'
end

Returns:

  • (String)

See Also:



118
119
120
# File 'app/parsers/bulkrax/application_parser.rb', line 118

def related_children_parsed_mapping
  @related_children_parsed_mapping ||= get_field_mapping_hash_for('related_children_field_mapping')&.keys&.first || 'children'
end

Returns:

  • (String, NilClass)

See Also:



112
113
114
# File 'app/parsers/bulkrax/application_parser.rb', line 112

def related_children_raw_mapping
  @related_children_raw_mapping ||= get_field_mapping_hash_for('related_children_field_mapping')&.values&.first&.[]('from')&.first
end

Returns:

  • (String)

See Also:

  • #related_parents_field_mapping


106
107
108
# File 'app/parsers/bulkrax/application_parser.rb', line 106

def related_parents_parsed_mapping
  @related_parents_parsed_mapping ||= get_field_mapping_hash_for('related_parents_field_mapping')&.keys&.first || 'parents'
end

Returns:

  • (String, NilClass)

See Also:



100
101
102
# File 'app/parsers/bulkrax/application_parser.rb', line 100

def related_parents_raw_mapping
  @related_parents_raw_mapping ||= get_field_mapping_hash_for('related_parents_field_mapping')&.values&.first&.[]('from')&.first
end

#remove_spaces_from_filenamesObject

File names referenced in CSVs have spaces replaced with underscores



454
455
456
457
458
459
460
461
462
463
464
465
# File 'app/parsers/bulkrax/application_parser.rb', line 454

def remove_spaces_from_filenames
  files = Dir.glob(File.join(importer_unzip_path, 'files', '*'))
  files_with_spaces = files.select { |f| f.split('/').last.match?(' ') }
  return if files_with_spaces.blank?

  files_with_spaces.map! { |path| Pathname.new(path) }
  files_with_spaces.each do |path|
    filename = path.basename
    filename_without_spaces = filename.to_s.tr(' ', '_')
    path.rename(File.join(path.dirname, filename_without_spaces))
  end
end

#required_elementsArray<String>

Returns:

  • (Array<String>)


368
369
370
371
372
373
374
375
376
377
378
379
380
# File 'app/parsers/bulkrax/application_parser.rb', line 368

def required_elements
  matched_elements = ((importerexporter.mapping.keys || []) & (Bulkrax.required_elements || []))
  unless matched_elements.count == Bulkrax.required_elements.count
    missing_elements = Bulkrax.required_elements - matched_elements
    error_alert = "Missing mapping for at least one required element, missing mappings are: #{missing_elements.join(', ')}"
    raise StandardError, error_alert
  end
  if Bulkrax.fill_in_blank_source_identifiers
    Bulkrax.required_elements
  else
    Bulkrax.required_elements + [source_identifier]
  end
end

#retrieve_cloud_files(_files, _importer) ⇒ Object

Optional, define if using browse everything for file upload



278
# File 'app/parsers/bulkrax/application_parser.rb', line 278

def retrieve_cloud_files(_files, _importer); end

#setup_export_fileObject

This method is abstract.

Subclass and override #setup_export_file to implement behavior for the parser.

Raises:

  • (NotImplementedError)


310
311
312
# File 'app/parsers/bulkrax/application_parser.rb', line 310

def setup_export_file
  raise NotImplementedError, 'must be defined' if exporter?
end

#source_identifierSymbol

importing (e.g. is not this application that mounts this Bulkrax engine).

Returns:

  • (Symbol)

    the name of the identifying property in the source system from which we’re

See Also:



75
76
77
# File 'app/parsers/bulkrax/application_parser.rb', line 75

def source_identifier
  @source_identifier ||= get_field_mapping_hash_for('source_identifier')&.values&.first&.[]('from')&.first&.to_sym || :source_identifier
end

#totalObject



416
417
418
# File 'app/parsers/bulkrax/application_parser.rb', line 416

def total
  0
end

#untar(file_to_untar) ⇒ Object



445
446
447
448
449
450
# File 'app/parsers/bulkrax/application_parser.rb', line 445

def untar(file_to_untar)
  Dir.mkdir(importer_unzip_path(mkdir: true)) unless File.directory?(importer_unzip_path(mkdir: true))
  command = "tar -xzf #{Shellwords.escape(file_to_untar)} -C #{Shellwords.escape(importer_unzip_path)}"
  result = system(command)
  raise "Failed to extract #{file_to_untar}" unless result
end

#unzip(file_to_unzip) ⇒ Object



433
434
435
436
437
438
439
440
441
442
443
# File 'app/parsers/bulkrax/application_parser.rb', line 433

def unzip(file_to_unzip)
  return untar(file_to_unzip) if file_to_unzip.end_with?('.tar.gz')

  Zip::File.open(file_to_unzip) do |zip_file|
    zip_file.each do |entry|
      entry_path = File.join(importer_unzip_path(mkdir: true), entry.name)
      FileUtils.mkdir_p(File.dirname(entry_path))
      zip_file.extract(entry, entry_path) unless File.exist?(entry_path)
    end
  end
end

#valid_import?TrueClass, FalseClass

Override to add specific validations

Returns:

  • (TrueClass, FalseClass)


339
340
341
# File 'app/parsers/bulkrax/application_parser.rb', line 339

def valid_import?
  true
end

#visibilityString

The visibility of the record. Acceptable values are: “open”, “embargo”, “lease”, “authenticated”, “restricted”. The default is “open”



158
159
160
# File 'app/parsers/bulkrax/application_parser.rb', line 158

def visibility
  @visibility ||= self.parser_fields['visibility'] || 'open'
end

#work_entry_classObject



48
49
50
# File 'app/parsers/bulkrax/application_parser.rb', line 48

def work_entry_class
  entry_class
end

#work_identifierSymbol

Returns the name of the identifying property for the system which we’re importing into (e.g. the application that mounts this Bulkrax engine).

Returns:

  • (Symbol)

    the name of the identifying property for the system which we’re importing into (e.g. the application that mounts this Bulkrax engine)

See Also:



82
83
84
# File 'app/parsers/bulkrax/application_parser.rb', line 82

def work_identifier
  @work_identifier ||= get_field_mapping_hash_for('source_identifier')&.keys&.first&.to_sym || :source
end

#work_identifier_search_fieldSymbol

Returns the solr property of the source_identifier. Used for searching. defaults to work_identifier value + “_sim”.

Returns:

  • (Symbol)

    the solr property of the source_identifier. Used for searching. defaults to work_identifier value + “_sim”

See Also:



89
90
91
# File 'app/parsers/bulkrax/application_parser.rb', line 89

def work_identifier_search_field
  @work_identifier_search_field ||= Array.wrap(get_field_mapping_hash_for('source_identifier')&.values&.first&.[]('search_field'))&.first&.to_s || "#{work_identifier}_sim"
end

#writeObject



428
429
430
431
# File 'app/parsers/bulkrax/application_parser.rb', line 428

def write
  write_files
  zip
end

#write_filesObject

This method is abstract.

Subclass and override #write_files to implement behavior for the parser.

Raises:

  • (NotImplementedError)


315
316
317
# File 'app/parsers/bulkrax/application_parser.rb', line 315

def write_files
  raise NotImplementedError, 'must be defined' if exporter?
end

#write_import_file(file) ⇒ Object

Parameters:

  • file (#path, #original_filename)

    the file object that with the relevant data for the import.



282
283
284
285
286
287
288
289
# File 'app/parsers/bulkrax/application_parser.rb', line 282

def write_import_file(file)
  path = File.join(path_for_import, file.original_filename)
  FileUtils.mv(
    file.path,
    path
  )
  path
end

#zipObject



467
468
469
470
471
472
473
474
475
476
477
478
479
480
# File 'app/parsers/bulkrax/application_parser.rb', line 467

def zip
  FileUtils.mkdir_p(exporter_export_zip_path)

  Dir["#{exporter_export_path}/**"].each do |folder|
    zip_path = "#{exporter_export_zip_path.split('/').last}_#{folder.split('/').last}.zip"
    FileUtils.rm_rf("#{exporter_export_zip_path}/#{zip_path}")

    Zip::File.open(File.join("#{exporter_export_zip_path}/#{zip_path}"), create: true) do |zip_file|
      Dir["#{folder}/**/**"].each do |file|
        zip_file.add(file.sub("#{folder}/", ''), file)
      end
    end
  end
end