Class: Krikri::Harvesters::OAIHarvester
- Inherits:
-
Object
- Object
- Krikri::Harvesters::OAIHarvester
- Includes:
- Krikri::Harvester
- Defined in:
- lib/krikri/harvesters/oai_harvester.rb
Overview
A harvester implementation for OAI-PMH
Accepts options to pass to OAI client as ‘:oai => opts`
Options allowed are:
- set: A string or array of strings specifying the sets to harvest.
If multiple sets are given, they will be lazily requested from
`OAI::Client#list_records` in turn and combined into a single
enumerator.
- skip_set: A string or array of strings specifying the sets to skip.
If both `set` and `skip_set` are given, sets specified as
skip are excluded from the harvest. Otherwise, all sets
returned by `#set` except those skipped will be harvested.
- metadata_prefix: A string specifying the metadata prefix. e.g. 'oai_dc'.
- from: The begin date for the harvest.
- until: The end date for the harvest.
- id_path: An alternate xpath (e.g. '//dc:identifier') from which to load
raw identifier strings. Default: <record><header><identifier>.
Constant Summary
Constants included from Krikri::Harvester
Instance Attribute Summary collapse
-
#client ⇒ Object
Returns the value of attribute client.
Attributes included from Krikri::Harvester
Attributes included from SoftwareAgent
Class Method Summary collapse
-
.expected_opts ⇒ Hash
A hash documenting the allowable options to pass to initializers.
Instance Method Summary collapse
-
#concat_enum(enum_enum) ⇒ Object
Concatinates two enumerators.
-
#count ⇒ Object
Count on record_ids will request all ids and load them into memory TODO: an efficient implementation of count for OAI.
-
#get_record(identifier, opts = {}) ⇒ Object
Gets a single record with the given identifier from the OAI endpoint.
-
#initialize(opts = {}) ⇒ OAIHarvester
constructor
A new instance of OAIHarvester.
-
#record_ids(opts = {}) ⇒ Object
Sends ListIdentifier requests lazily.
-
#records(opts = {}) ⇒ Object
Sends ListRecords requests lazily.
-
#sets(opts = {}, &block) ⇒ Array<OAI::Set>
Lists the sets available from the OAI endpoint.
Methods included from Krikri::Harvester
Methods included from SoftwareAgent
Constructor Details
#initialize(opts = {}) ⇒ OAIHarvester
Returns a new instance of OAIHarvester.
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
# File 'lib/krikri/harvesters/oai_harvester.rb', line 41 def initialize(opts = {}) opts[:harvest_behavior] ||= OAISkipDeletedBehavior super @opts = opts.fetch(:oai, {}) @id_path = @opts.delete(:id_path) { false } http_conn = Faraday.new do |conn| conn.request :retry, :max => 3 conn.response :follow_redirects, :limit => 5 conn.response :logger, Rails.logger conn.adapter :net_http end @client = OAI::Client.new(uri, :http => http_conn) end |
Instance Attribute Details
#client ⇒ Object
Returns the value of attribute client.
31 32 33 |
# File 'lib/krikri/harvesters/oai_harvester.rb', line 31 def client @client end |
Class Method Details
.expected_opts ⇒ Hash
Returns A hash documenting the allowable options to pass to initializers.
141 142 143 144 145 146 147 148 149 150 151 152 153 |
# File 'lib/krikri/harvesters/oai_harvester.rb', line 141 def self.expected_opts { key: :oai, opts: { set: { type: :string, required: false, multiple_ok: true }, skip_set: { type: :string, required: false, multiple_ok: true }, metadata_prefix: { type: :string, required: false }, from: { type: :string, required: false }, until: { type: :string, required: false }, id_path: { type: :string, required: false } } } end |
Instance Method Details
#concat_enum(enum_enum) ⇒ Object
find a better home for this. Reopen Enumerable? or use the ‘Enumerating` gem: github.com/mdub/enumerating
Concatinates two enumerators
159 160 161 162 163 164 165 |
# File 'lib/krikri/harvesters/oai_harvester.rb', line 159 def concat_enum(enum_enum) Enumerator.new do |yielder| enum_enum.each do |enum| enum.each { |i| yielder << i } end end end |
#count ⇒ Object
Count on record_ids will request all ids and load them into memory TODO: an efficient implementation of count for OAI
77 78 79 |
# File 'lib/krikri/harvesters/oai_harvester.rb', line 77 def count raise NotImplementedError end |
#get_record(identifier, opts = {}) ⇒ Object
Gets a single record with the given identifier from the OAI endpoint
111 112 113 114 115 116 |
# File 'lib/krikri/harvesters/oai_harvester.rb', line 111 def get_record(identifier, opts = {}) opts[:identifier] = identifier opts = @opts.merge(opts) @record_class.build(mint_id(identifier), record_xml(client.get_record(opts).record)) end |
#record_ids(opts = {}) ⇒ Object
Sends ListIdentifier requests lazily.
The following will only send requests to the endpoint until it has 1000 record ids:
record_ids.take(1000)
68 69 70 71 72 73 |
# File 'lib/krikri/harvesters/oai_harvester.rb', line 68 def record_ids(opts = {}) opts = @opts.merge(opts) request_with_sets(opts) do |set_opts| client.list_identifiers(set_opts).full.lazy.flat_map(&:identifier) end end |
#records(opts = {}) ⇒ Object
Sends ListRecords requests lazily.
The following will only send requests to the endpoint until it has 1000 records:
records.take(1000)
91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
# File 'lib/krikri/harvesters/oai_harvester.rb', line 91 def records(opts = {}) opts = @opts.merge(opts) request_with_sets(opts) do |set_opts| client.list_records(set_opts).full.lazy.flat_map do |rec| begin @record_class.build(mint_id(get_identifier(rec)), record_xml(rec)) rescue => e Krikri::Logger.log(:error, e.) next end end end end |
#sets(opts = {}, &block) ⇒ Array<OAI::Set>
Lists the sets available from the OAI endpoint. Accepts a block to pass to ‘#map` on the resulting array.
@example:
sets(&:spec)
130 131 132 133 134 |
# File 'lib/krikri/harvesters/oai_harvester.rb', line 130 def sets(opts = {}, &block) arry = client.list_sets.full.to_a return arry unless block_given? arry.map(&block) end |