Class: Stash::Harvester::OAIPMH::OAIHarvestTask

Inherits:
HarvestTask
  • Object
show all
Defined in:
lib/stash/harvester/oaipmh/oai_harvest_task.rb

Overview

Class representing a single OAI-PMH harvest (ListRecords) operation.

Instance Attribute Summary

Attributes inherited from HarvestTask

#config, #from_time, #until_time

Instance Method Summary collapse

Constructor Details

#initialize(config:, from_time: nil, until_time: nil) ⇒ OAIHarvestTask

Creates a new ListRecordsTask for harvesting from the specified OAI-PMH repository, with an optional datetime range and metadata prefix. Note that the datetime range must be in UTC.

Parameters:

  • config (OAISourceConfig)

    The configuration of the OAI data source.

  • from_time (Time, nil) (defaults to: nil)

    the start (inclusive) of the datestamp range for selective harvesting. If from_time is omitted, harvesting will extend back to the earliest datestamp in the repository. (Optional)

  • until_time (Time, nil) (defaults to: nil)

    the end (inclusive) of the datestamp range for selective harvesting. If until_time is omitted, harvesting will extend forward to the latest datestamp in the repository. (Optional)

Raises:

  • (ArgumentError)

    if from_time or until_time is not in UTC.

  • (RangeError)

    if from_time is later than until_time.


32
33
34
# File 'lib/stash/harvester/oaipmh/oai_harvest_task.rb', line 32

def initialize(config:, from_time: nil, until_time: nil)
  super
end

Instance Method Details

#harvest_recordsEnumerator::Lazy<OAIRecord>

Performs a ListRecords operation and returns the result as a lazy enumerator of Stash::Harvester::OAIPMH::OAIRecords. Paged responses are transparently fetched one page at a time, as necessary.

Returns:

  • (Enumerator::Lazy<OAIRecord>)

    A lazy enumerator of the harvested records


56
57
58
59
60
61
62
63
64
# File 'lib/stash/harvester/oaipmh/oai_harvest_task.rb', line 56

def harvest_records
  base_uri = config.source_uri
  client = OAI::Client.new(base_uri.to_s)
  records = client.list_records(opts)
  return [].lazy unless records
  full = records.full
  enum = full.lazy.to_enum
  enum.map { |r| OAIRecord.new(r) }
end

#optsHash

Creates a hash containing the HarvestTask#config options, HarvestTask#from_time, and HarvestTask#until_time (if present) formatted appropriately and with appropriate keys to be included in the ListRecords request

Returns:

  • (Hash)

    the options passed to the ListRecords verb


44
45
46
47
48
49
# File 'lib/stash/harvester/oaipmh/oai_harvest_task.rb', line 44

def opts
  opts = config.to_h
  (opts[:from] = to_s(from_time)) if from_time
  (opts[:until] = to_s(until_time)) if until_time
  opts
end