Class: OAI::Client

Inherits:
Object
  • Object
show all
Defined in:
lib/oai/client.rb

Overview

A `OAI::Client` provides a client api for issuing OAI-PMH verbs against a OAI-PMH server. The 6 OAI-PMH verbs translate directly to methods you can call on a `OAI::Client` object. Verb arguments are passed as a hash:

“`ruby

client = OAI::Client.new 'http://www.pubmedcentral.gov/oai/oai.cgi'
record = client.get_record :identifier => 'oai:pubmedcentral.gov:13901'
for identifier in client.list_identifiers
  puts identifier
end

“`

It is worth noting that the API uses methods and parameter names with underscores in them rather than studly caps. So above `list_identifiers` and `metadata_prefix` are used instead of the `listIdentifiers` and `metadataPrefix` used in the OAI-PMH specification.

Also, the from and until arguments which specify dates should be passed in as `Date` or `DateTime` objects depending on the granularity supported by the server.

For detailed information on the arguments that can be used please consult the OAI-PMH docs at <www.openarchives.org/OAI/openarchivesprotocol.html>.

Instance Method Summary collapse

Constructor Details

#initialize(base_url, options = {}) ⇒ Client

The constructor which must be passed a valid base url for an oai service:

client = OAI::Client.new 'http://www.pubmedcentral.gov/oai/oai.cgi'

If you want to see debugging messages on `STDERR` use:

client = OAI::Client.new 'http://example.com', :debug => true

By default OAI verbs called on the client will return `REXML::Element` objects for metadata records, however if you wish you can use the `:parser` option to indicate you want to use `libxml` instead, and get back `XML::Node` objects

client = OAI::Client.new 'http://example.com', :parser => 'libxml'

You can configure the Faraday HTTP client by providing an alternate Faraday instance:

“`ruby client = OAI::Client.new 'example.com', :http => Faraday.new {|c|} “`

### HIGH PERFORMANCE

If you want to supercharge this api install `libxml-ruby >= 0.3.8` and use the `:parser` option when you construct your `OAI::Client`.


86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# File 'lib/oai/client.rb', line 86

def initialize(base_url, options={})
  @base = URI.parse base_url
  @debug = options.fetch(:debug, false)
  @parser = options.fetch(:parser, 'rexml')

  @http_client = options.fetch(:http) do
    Faraday.new(:url => @base.clone) do |builder|
      follow_redirects = options.fetch(:redirects, true)
      if follow_redirects
        count = follow_redirects.is_a?(Fixnum) ? follow_redirects : 5

        require 'faraday_middleware'
        builder.response :follow_redirects, :limit => count
      end
      builder.adapter :net_http
    end
  end

  # load appropriate parser
  case @parser
  when 'libxml'
    begin
      require 'rubygems'
      require 'xml/libxml'
    rescue
      raise OAI::Exception.new("xml/libxml not available")
    end
  when 'rexml'
    require 'rexml/document'
    require 'rexml/xpath'
  else
    raise OAI::Exception.new("unknown parser: #{@parser}")
  end
end

Instance Method Details

#get_record(opts = {}) ⇒ Object

Equivalent to a `GetRecord` request. You must supply an `:identifier` argument. You should get back a `OAI::GetRecordResponse` object which you can extract a `OAI::Record` object from.


153
154
155
# File 'lib/oai/client.rb', line 153

def get_record(opts={})
  OAI::GetRecordResponse.new(do_request('GetRecord', opts))
end

#identifyObject

Equivalent to a `Identify` request. You'll get back a `OAI::IdentifyResponse` object which is essentially just a wrapper around a `REXML::Document` for the response. If you created your client using the `libxml` parser then you will get an `XML::Node` object instead.


126
127
128
# File 'lib/oai/client.rb', line 126

def identify
  OAI::IdentifyResponse.new(do_request('Identify'))
end

#list_identifiers(opts = {}) ⇒ Object

Equivalent to a `ListIdentifiers` request. Pass in `:from`, `:until` arguments as `Date` or `DateTime` objects as appropriate depending on the granularity supported by the server.

You can use seamless resumption with this verb, which allows you to mitigate (to some extent) the lack of a `Count` verb:

client.list_identifiers.full.count # Don't try this on PubMed though!

146
147
148
# File 'lib/oai/client.rb', line 146

def list_identifiers(opts={})
  do_resumable(OAI::ListIdentifiersResponse, 'ListIdentifiers', opts)
end

#list_metadata_formats(opts = {}) ⇒ Object

Equivalent to a `ListMetadataFormats` request. A `ListMetadataFormatsResponse` object is returned to you.


133
134
135
# File 'lib/oai/client.rb', line 133

def (opts={})
  OAI::ListMetadataFormatsResponse.new(do_request('ListMetadataFormats', opts))
end

#list_records(opts = {}) ⇒ Object

Equivalent to the `ListRecords` request. A `ListRecordsResponse` will be returned which you can use to iterate through records

response = client.list_records
response.each do |record|
  puts record.
end

Alternately, you can use seamless resumption to avoid handling resumption tokens:

client.list_records.full.each do |record|
  puts record.
end

### Memory Use `:full` will avoid storing more than one page of records in memory, but your use it in ways that override that behaviour. Be careful to avoid using `client.list_records.full.entries` unless you really want to hold all the records in the feed in memory!


177
178
179
# File 'lib/oai/client.rb', line 177

def list_records(opts={})
  do_resumable(OAI::ListRecordsResponse, 'ListRecords', opts)
end

#list_sets(opts = {}) ⇒ Object

Equivalent to the `ListSets` request. A `ListSetsResponse` object will be returned which you can use for iterating through the `OAI::Set` objects

for set in client.list_sets
  puts set
end

A large number of sets is not unusual for some OAI-PMH feeds, so using seamless resumption may be preferable:

client.list_sets.full.each do |set|
  puts set
end

195
196
197
# File 'lib/oai/client.rb', line 195

def list_sets(opts={})
  do_resumable(OAI::ListSetsResponse, 'ListSets', opts)
end