Class: OAI::Client

Inherits:
Object
  • Object
show all
Defined in:
lib/oai/client.rb

Overview

A ‘OAI::Client` provides a client api for issuing OAI-PMH verbs against a OAI-PMH server. The 6 OAI-PMH verbs translate directly to methods you can call on a `OAI::Client` object. Verb arguments are passed as a hash:

“‘ruby

client = OAI::Client.new 'http://www.pubmedcentral.gov/oai/oai.cgi'
record = client.get_record :identifier => 'oai:pubmedcentral.gov:13901'
for identifier in client.list_identifiers
  puts identifier
end

“‘

It is worth noting that the API uses methods and parameter names with underscores in them rather than studly caps. So above ‘list_identifiers` and `metadata_prefix` are used instead of the `listIdentifiers` and `metadataPrefix` used in the OAI-PMH specification.

Also, the from and until arguments which specify dates should be passed in as ‘Date` or `DateTime` objects depending on the granularity supported by the server.

For detailed information on the arguments that can be used please consult the OAI-PMH docs at <www.openarchives.org/OAI/openarchivesprotocol.html>.

Instance Method Summary collapse

Constructor Details

#initialize(base_url, options = {}) ⇒ Client

The constructor which must be passed a valid base url for an oai service:

client = OAI::Client.new 'http://www.pubmedcentral.gov/oai/oai.cgi'

If you want to see debugging messages on ‘STDERR` use:

client = OAI::Client.new 'http://example.com', :debug => true

By default OAI verbs called on the client will return ‘REXML::Element` objects for metadata records, however if you wish you can use the `:parser` option to indicate you want to use `libxml` instead, and get back `XML::Node` objects

client = OAI::Client.new 'http://example.com', :parser => 'libxml'

You can configure the Faraday HTTP client by providing an alternate Faraday instance:

“‘ruby client = OAI::Client.new ’example.com’, :http => Faraday.new {|c|} “‘

### HIGH PERFORMANCE

If you want to supercharge this api install ‘libxml-ruby >= 0.3.8` and use the `:parser` option when you construct your `OAI::Client`.



86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
# File 'lib/oai/client.rb', line 86

def initialize(base_url, options={})
  @base = URI.parse base_url
  @debug = options.fetch(:debug, false)
  @parser = options.fetch(:parser, 'rexml')
  @headers = options.fetch(:headers, {})

  @http_client = options.fetch(:http) do
    Faraday.new(:url => @base.clone) do |builder|
      follow_redirects = options.fetch(:redirects, true)
      follow_redirects = 5 if follow_redirects == true

      if follow_redirects
        require 'faraday_middleware'
        builder.response :follow_redirects, :limit => follow_redirects.to_i
      end
      builder.adapter :net_http
    end
  end

  # load appropriate parser
  case @parser
  when 'libxml'
    begin
      require 'rubygems'
      require 'xml/libxml'
    rescue
      raise OAI::Exception.new("xml/libxml not available")
    end
  when 'rexml'
    require 'rexml/document'
    require 'rexml/xpath'
  else
    raise OAI::Exception.new("unknown parser: #{@parser}")
  end
end

Instance Method Details

#get_record(opts = {}) ⇒ Object

Equivalent to a ‘GetRecord` request. You must supply an `:identifier` argument. You should get back a `OAI::GetRecordResponse` object which you can extract a `OAI::Record` object from.



154
155
156
# File 'lib/oai/client.rb', line 154

def get_record(opts={})
  OAI::GetRecordResponse.new(do_request('GetRecord', opts))
end

#identifyObject

Equivalent to a ‘Identify` request. You’ll get back a ‘OAI::IdentifyResponse` object which is essentially just a wrapper around a `REXML::Document` for the response. If you created your client using the `libxml` parser then you will get an `XML::Node` object instead.



127
128
129
# File 'lib/oai/client.rb', line 127

def identify
  OAI::IdentifyResponse.new(do_request('Identify'))
end

#list_identifiers(opts = {}) ⇒ Object

Equivalent to a ‘ListIdentifiers` request. Pass in `:from`, `:until` arguments as `Date` or `DateTime` objects as appropriate depending on the granularity supported by the server.

You can use seamless resumption with this verb, which allows you to mitigate (to some extent) the lack of a ‘Count` verb:

client.list_identifiers.full.count # Don't try this on PubMed though!


147
148
149
# File 'lib/oai/client.rb', line 147

def list_identifiers(opts={})
  do_resumable(OAI::ListIdentifiersResponse, 'ListIdentifiers', opts)
end

#list_metadata_formats(opts = {}) ⇒ Object

Equivalent to a ‘ListMetadataFormats` request. A `ListMetadataFormatsResponse` object is returned to you.



134
135
136
# File 'lib/oai/client.rb', line 134

def (opts={})
  OAI::ListMetadataFormatsResponse.new(do_request('ListMetadataFormats', opts))
end

#list_records(opts = {}) ⇒ Object

Equivalent to the ‘ListRecords` request. A `ListRecordsResponse` will be returned which you can use to iterate through records

response = client.list_records
response.each do |record|
  puts record.
end

Alternately, you can use seamless resumption to avoid handling resumption tokens:

client.list_records.full.each do |record|
  puts record.
end

### Memory Use ‘:full` will avoid storing more than one page of records in memory, but your use it in ways that override that behaviour. Be careful to avoid using `client.list_records.full.entries` unless you really want to hold all the records in the feed in memory!



178
179
180
# File 'lib/oai/client.rb', line 178

def list_records(opts={})
  do_resumable(OAI::ListRecordsResponse, 'ListRecords', opts)
end

#list_sets(opts = {}) ⇒ Object

Equivalent to the ‘ListSets` request. A `ListSetsResponse` object will be returned which you can use for iterating through the `OAI::Set` objects

for set in client.list_sets
  puts set
end

A large number of sets is not unusual for some OAI-PMH feeds, so using seamless resumption may be preferable:

client.list_sets.full.each do |set|
  puts set
end


196
197
198
# File 'lib/oai/client.rb', line 196

def list_sets(opts={})
  do_resumable(OAI::ListSetsResponse, 'ListSets', opts)
end