Class: Krikri::Harvesters::OAIHarvester
- Inherits:
-
Object
- Object
- Krikri::Harvesters::OAIHarvester
- Includes:
- Krikri::Harvester
- Defined in:
- lib/krikri/harvesters/oai_harvester.rb
Overview
A harvester implementation for OAI-PMH
Accepts options to pass to OAI client as ‘:oai => opts`
Options allowed are:
- set: A string or array of strings the sets to harvest.
If multiple sets are given, they will be lazily requested from
`OAI::Client#list_records` in turn and combined into a single
enumerator.
- skip_set: A string or array of strings the sets to skip.
If both `set` and `skip_set` are given, sets specified as
skip are excluded from the harvest. Otherwise, all sets
returned by `#set` except those skipped will be harvested.
- metadata_prefix: A string the prefix. e.g. 'oai_dc'.
- from: The begin date for the harvest.
- until: The end date for the harvest.
Constant Summary
Constants included from Krikri::Harvester
Instance Attribute Summary collapse
-
#client ⇒ Object
Returns the value of attribute client.
Attributes included from Krikri::Harvester
Class Method Summary collapse
-
.expected_opts ⇒ Hash
A hash documenting the allowable options to pass to initializers.
Instance Method Summary collapse
-
#concat_enum(enum_enum) ⇒ Object
Concatinates two enumerators.
-
#count ⇒ Object
Count on record_ids will request all ids and load them into memory TODO: an efficient implementation of count for OAI.
-
#get_record(identifier, opts = {}) ⇒ Object
Gets a single record with the given identifier from the OAI endpoint.
-
#initialize(opts = {}) ⇒ OAIHarvester
constructor
A new instance of OAIHarvester.
-
#record_ids(opts = {}) ⇒ Object
Sends ListIdentifier requests lazily.
-
#records(opts = {}) ⇒ Object
Sends ListRecords requests lazily.
-
#sets(opts = {}, &block) ⇒ Array<OAI::Set>
Lists the sets available from the OAI endpoint.
Methods included from Krikri::Harvester
Methods included from SoftwareAgent
#agent_name, #entity_behavior, #run
Constructor Details
#initialize(opts = {}) ⇒ OAIHarvester
Returns a new instance of OAIHarvester.
37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
# File 'lib/krikri/harvesters/oai_harvester.rb', line 37 def initialize(opts = {}) opts[:harvest_behavior] ||= OAISkipDeletedBehavior super @opts = opts.fetch(:oai, {}) http_conn = Faraday.new do |conn| conn.request :retry, :max => 3 conn.response :follow_redirects, :limit => 5 conn.response :logger, Rails.logger conn.adapter :net_http end @client = OAI::Client.new(uri, :http => http_conn) end |
Instance Attribute Details
#client ⇒ Object
Returns the value of attribute client.
29 30 31 |
# File 'lib/krikri/harvesters/oai_harvester.rb', line 29 def client @client end |
Class Method Details
.expected_opts ⇒ Hash
Returns A hash documenting the allowable options to pass to initializers.
130 131 132 133 134 135 136 137 138 139 140 141 |
# File 'lib/krikri/harvesters/oai_harvester.rb', line 130 def self.expected_opts { key: :oai, opts: { set: {type: :string, required: false, multiple_ok: true}, skip_set: {type: :string, required: false, multiple_ok: true}, metadata_prefix: {type: :string, required: false}, from: {type: :string, required: false}, until: {type: :string, required: false} } } end |
Instance Method Details
#concat_enum(enum_enum) ⇒ Object
find a better home for this. Reopen Enumerable? or use the ‘Enumerating` gem: github.com/mdub/enumerating
Concatinates two enumerators
147 148 149 150 151 152 153 |
# File 'lib/krikri/harvesters/oai_harvester.rb', line 147 def concat_enum(enum_enum) Enumerator.new do |yielder| enum_enum.each do |enum| enum.each { |i| yielder << i } end end end |
#count ⇒ Object
Count on record_ids will request all ids and load them into memory TODO: an efficient implementation of count for OAI
71 72 73 |
# File 'lib/krikri/harvesters/oai_harvester.rb', line 71 def count raise NotImplementedError end |
#get_record(identifier, opts = {}) ⇒ Object
Gets a single record with the given identifier from the OAI endpoint
100 101 102 103 104 105 |
# File 'lib/krikri/harvesters/oai_harvester.rb', line 100 def get_record(identifier, opts = {}) opts[:identifier] = identifier opts = @opts.merge(opts) @record_class.build(mint_id(identifier), record_xml(client.get_record(opts).record)) end |
#record_ids(opts = {}) ⇒ Object
Sends ListIdentifier requests lazily.
The following will only send requests to the endpoint until it has 1000 record ids:
record_ids.take(1000)
62 63 64 65 66 67 |
# File 'lib/krikri/harvesters/oai_harvester.rb', line 62 def record_ids(opts = {}) opts = @opts.merge(opts) request_with_sets(opts) do |set_opts| client.list_identifiers(set_opts).full.lazy.flat_map(&:identifier) end end |
#records(opts = {}) ⇒ Object
Sends ListRecords requests lazily.
The following will only send requests to the endpoint until it has 1000 records:
records.take(1000)
85 86 87 88 89 90 91 92 93 |
# File 'lib/krikri/harvesters/oai_harvester.rb', line 85 def records(opts = {}) opts = @opts.merge(opts) request_with_sets(opts) do |set_opts| client.list_records(set_opts).full.lazy.flat_map do |rec| @record_class.build(mint_id(rec.header.identifier), record_xml(rec)) end end end |
#sets(opts = {}, &block) ⇒ Array<OAI::Set>
Lists the sets available from the OAI endpoint. Accepts a block to pass to ‘#map` on the resulting array.
@example:
sets(&:spec)
119 120 121 122 123 |
# File 'lib/krikri/harvesters/oai_harvester.rb', line 119 def sets(opts = {}, &block) arry = client.list_sets.full.to_a return arry unless block_given? arry.map(&block) end |