Class: Krikri::Harvesters::OAIHarvester

Inherits:
Object
  • Object
show all
Includes:
Krikri::Harvester
Defined in:
lib/krikri/harvesters/oai_harvester.rb

Overview

A harvester implementation for OAI-PMH

Accepts options to pass to OAI client as ‘:oai => opts`

Options allowed are:

- set: A string or array of strings specifying the sets to harvest.
       If multiple sets are given, they will be lazily requested from
       `OAI::Client#list_records` in turn and combined into a single
       enumerator.
- skip_set: A string or array of strings specifying the sets to skip.
            If both `set` and `skip_set` are given, sets specified as
            skip are excluded from the harvest. Otherwise, all sets
            returned by `#set` except those skipped will be harvested.
- metadata_prefix: A string specifying the metadata prefix. e.g. 'oai_dc'.
- from: The begin date for the harvest.
- until: The end date for the harvest.
- id_path: An alternate xpath (e.g. '//dc:identifier') from which to load
           raw identifier strings. Default:  <record><header><identifier>.

Examples:


OAIHarvester.new(:uri => endpoint,
  :oai => { :set => 'my_set', :metadata_prefix => 'oai_dc' }

See Also:

Constant Summary

Constants included from Krikri::Harvester

Krikri::Harvester::Registry

Instance Attribute Summary collapse

Attributes included from Krikri::Harvester

#name, #uri

Attributes included from SoftwareAgent

#entity_behavior

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Krikri::Harvester

#run

Methods included from SoftwareAgent

#agent_name, #run

Constructor Details

#initialize(opts = {}) ⇒ OAIHarvester

Returns a new instance of OAIHarvester.

Parameters:

  • opts (Hash) (defaults to: {})

    options to pass through to client requests. Allowable options are specified in OAI::Const::Verbs. Currently :from, :until, :set, and :metadata_prefix. Additionally, you may pass an xpath string to ‘:id_path` specifying the location of the IDs.

See Also:

  • OAI::Client
  • #expected_opts


41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# File 'lib/krikri/harvesters/oai_harvester.rb', line 41

def initialize(opts = {})
  opts[:harvest_behavior] ||= OAISkipDeletedBehavior
  super

  @opts    = opts.fetch(:oai, {})
  @id_path = @opts.delete(:id_path) { false }

  http_conn = Faraday.new do |conn|
    conn.request :retry, :max => 3
    conn.response :follow_redirects, :limit => 5
    conn.response :logger, Rails.logger
    conn.adapter :net_http
  end

  @client = OAI::Client.new(uri, :http => http_conn)
end

Instance Attribute Details

#clientObject

Returns the value of attribute client.



31
32
33
# File 'lib/krikri/harvesters/oai_harvester.rb', line 31

def client
  @client
end

Class Method Details

.expected_optsHash

Returns A hash documenting the allowable options to pass to initializers.

Returns:

  • (Hash)

    A hash documenting the allowable options to pass to initializers.

See Also:

  • Krikri::Harvester::expected_opts


141
142
143
144
145
146
147
148
149
150
151
152
153
# File 'lib/krikri/harvesters/oai_harvester.rb', line 141

def self.expected_opts
  {
    key: :oai,
    opts: {
      set: { type: :string, required: false, multiple_ok: true },
      skip_set: { type: :string, required: false, multiple_ok: true },
      metadata_prefix: { type: :string, required: false },
      from: { type: :string, required: false },
      until: { type: :string, required: false },
      id_path: { type: :string, required: false }
    }
  }
end

Instance Method Details

#concat_enum(enum_enum) ⇒ Object

TODO:

find a better home for this. Reopen Enumerable? or use the ‘Enumerating` gem: github.com/mdub/enumerating

Concatinates two enumerators



159
160
161
162
163
164
165
# File 'lib/krikri/harvesters/oai_harvester.rb', line 159

def concat_enum(enum_enum)
  Enumerator.new do |yielder|
    enum_enum.each do |enum|
      enum.each { |i| yielder << i }
    end
  end
end

#countObject

Count on record_ids will request all ids and load them into memory TODO: an efficient implementation of count for OAI

Raises:

  • (NotImplementedError)


77
78
79
# File 'lib/krikri/harvesters/oai_harvester.rb', line 77

def count
  raise NotImplementedError
end

#get_record(identifier, opts = {}) ⇒ Object

Gets a single record with the given identifier from the OAI endpoint

Parameters:

  • identifier (#to_s)

    the identifier of the record to get

  • opts (Hash) (defaults to: {})

    options to pass to the OAI client



111
112
113
114
115
116
# File 'lib/krikri/harvesters/oai_harvester.rb', line 111

def get_record(identifier, opts = {})
  opts[:identifier] = identifier
  opts = @opts.merge(opts)
  @record_class.build(mint_id(identifier),
                      record_xml(client.get_record(opts).record))
end

#record_ids(opts = {}) ⇒ Object

Sends ListIdentifier requests lazily.

The following will only send requests to the endpoint until it has 1000 record ids:

record_ids.take(1000)

Parameters:

  • opts (Hash) (defaults to: {})

    opts to pass to OAI::Client

See Also:

  • #expected_opts


68
69
70
71
72
73
# File 'lib/krikri/harvesters/oai_harvester.rb', line 68

def record_ids(opts = {})
  opts = @opts.merge(opts)
  request_with_sets(opts) do |set_opts|
    client.list_identifiers(set_opts).full.lazy.flat_map(&:identifier)
  end
end

#records(opts = {}) ⇒ Object

Sends ListRecords requests lazily.

The following will only send requests to the endpoint until it has 1000 records:

records.take(1000)

Parameters:

  • opts (Hash) (defaults to: {})

    opts to pass to OAI::Client

See Also:

  • #expected_opts


91
92
93
94
95
96
97
98
99
100
101
102
103
104
# File 'lib/krikri/harvesters/oai_harvester.rb', line 91

def records(opts = {})
  opts = @opts.merge(opts)
  request_with_sets(opts) do |set_opts|
    client.list_records(set_opts).full.lazy.flat_map do |rec|
      begin
        @record_class.build(mint_id(get_identifier(rec)),
                            record_xml(rec))
      rescue => e
        Krikri::Logger.log(:error, e.message)
        next
      end
    end
  end
end

#sets(opts = {}, &block) ⇒ Array<OAI::Set>

Lists the sets available from the OAI endpoint. Accepts a block to pass to ‘#map` on the resulting array.

@example:

sets(&:spec)

Parameters:

  • opts (Hash) (defaults to: {})

    options to pass to the OAI client

Returns:

  • (Array<OAI::Set>)

    an array of sets.

See Also:

  • OAI::Set


130
131
132
133
134
# File 'lib/krikri/harvesters/oai_harvester.rb', line 130

def sets(opts = {}, &block)
  arry = client.list_sets.full.to_a
  return arry unless block_given?
  arry.map(&block)
end