Class: OAI::Client

Inherits:
Object
  • Object
show all
Defined in:
lib/oai/client.rb

Overview

A OAI::Client provides a client api for issuing OAI-PMH verbs against a OAI-PMH server. The 6 OAI-PMH verbs translate directly to methods you can call on a OAI::Client object. Verb arguments are passed as a hash:

“‘ruby

client = OAI::Client.new 'http://www.pubmedcentral.gov/oai/oai.cgi'
record = client.get_record :identifier => 'oai:pubmedcentral.gov:13901'
for identifier in client.list_identifiers
  puts identifier
end

“‘

It is worth noting that the API uses methods and parameter names with underscores in them rather than studly caps. So above list_identifiers and metadata_prefix are used instead of the listIdentifiers and metadataPrefix used in the OAI-PMH specification.

Also, the from and until arguments which specify dates should be passed in as Date or DateTime objects depending on the granularity supported by the server.

For detailed information on the arguments that can be used please consult the OAI-PMH docs at <www.openarchives.org/OAI/openarchivesprotocol.html>.

Constant Summary collapse

UNESCAPED_AMPERSAND =
/&(?!(?:amp|lt|gt|quot|apos|\#\d+);)/

Instance Method Summary collapse

Constructor Details

#initialize(base_url, options = {}) ⇒ Client

The constructor which must be passed a valid base url for an oai service:

client = OAI::Client.new 'http://www.pubmedcentral.gov/oai/oai.cgi'

If you want to see debugging messages on STDERR use:

client = OAI::Client.new 'http://example.com', :debug => true

By default OAI verbs called on the client will return REXML::Element objects for metadata records, however if you wish you can use the :parser option to indicate you want to use libxml instead, and get back XML::Node objects

client = OAI::Client.new 'http://example.com', :parser => 'libxml'

You can configure the Faraday HTTP client by providing an alternate Faraday instance:

“‘ruby client = OAI::Client.new ’example.com’, :http => Faraday.new {|c|} “‘

### HIGH PERFORMANCE

If you want to supercharge this api install ‘libxml-ruby >= 0.3.8` and use the :parser option when you construct your OAI::Client.



86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
# File 'lib/oai/client.rb', line 86

def initialize(base_url, options={})
  @base = URI.parse base_url
  @debug = options.fetch(:debug, false)
  @parser = options.fetch(:parser, 'rexml')
  @headers = options.fetch(:headers, {})

  @http_client = options.fetch(:http) do
    Faraday.new(:url => @base.clone) do |builder|
      follow_redirects = options.fetch(:redirects, true)
      follow_redirects = 5 if follow_redirects == true

      if follow_redirects
        require 'faraday/follow_redirects'
        builder.use Faraday::FollowRedirects::Middleware
        builder.response :follow_redirects, :limit => follow_redirects.to_i
      end
      builder.adapter :net_http
    end
  end

  # load appropriate parser
  case @parser
  when 'libxml'
    begin
      require 'rubygems'
      require 'xml/libxml'
    rescue
      raise OAI::Exception.new("xml/libxml not available")
    end
  when 'rexml'
    require 'rexml/document'
    require 'rexml/xpath'
  else
    raise OAI::Exception.new("unknown parser: #{@parser}")
  end
end

Instance Method Details

#get_record(opts = {}) ⇒ Object

Equivalent to a GetRecord request. You must supply an :identifier argument. You should get back a OAI::GetRecordResponse object which you can extract a OAI::Record object from.



155
156
157
# File 'lib/oai/client.rb', line 155

def get_record(opts={})
  OAI::GetRecordResponse.new(do_request('GetRecord', opts))
end

#identifyObject

Equivalent to a Identify request. You’ll get back a OAI::IdentifyResponse object which is essentially just a wrapper around a REXML::Document for the response. If you created your client using the libxml parser then you will get an XML::Node object instead.



128
129
130
# File 'lib/oai/client.rb', line 128

def identify
  OAI::IdentifyResponse.new(do_request('Identify'))
end

#list_identifiers(opts = {}) ⇒ Object

Equivalent to a ListIdentifiers request. Pass in :from, :until arguments as Date or DateTime objects as appropriate depending on the granularity supported by the server.

You can use seamless resumption with this verb, which allows you to mitigate (to some extent) the lack of a Count verb:

client.list_identifiers.full.count # Don't try this on PubMed though!


148
149
150
# File 'lib/oai/client.rb', line 148

def list_identifiers(opts={})
  do_resumable(OAI::ListIdentifiersResponse, 'ListIdentifiers', opts)
end

#list_metadata_formats(opts = {}) ⇒ Object

Equivalent to a ListMetadataFormats request. A ListMetadataFormatsResponse object is returned to you.



135
136
137
# File 'lib/oai/client.rb', line 135

def (opts={})
  OAI::ListMetadataFormatsResponse.new(do_request('ListMetadataFormats', opts))
end

#list_records(opts = {}) ⇒ Object

Equivalent to the ListRecords request. A ListRecordsResponse will be returned which you can use to iterate through records

response = client.list_records
response.each do |record|
  puts record.
end

Alternately, you can use seamless resumption to avoid handling resumption tokens:

client.list_records.full.each do |record|
  puts record.
end

### Memory Use :full will avoid storing more than one page of records in memory, but your use it in ways that override that behaviour. Be careful to avoid using client.list_records.full.entries unless you really want to hold all the records in the feed in memory!



179
180
181
# File 'lib/oai/client.rb', line 179

def list_records(opts={})
  do_resumable(OAI::ListRecordsResponse, 'ListRecords', opts)
end

#list_sets(opts = {}) ⇒ Object

Equivalent to the ListSets request. A ListSetsResponse object will be returned which you can use for iterating through the OAI::Set objects

for set in client.list_sets
  puts set
end

A large number of sets is not unusual for some OAI-PMH feeds, so using seamless resumption may be preferable:

client.list_sets.full.each do |set|
  puts set
end


197
198
199
# File 'lib/oai/client.rb', line 197

def list_sets(opts={})
  do_resumable(OAI::ListSetsResponse, 'ListSets', opts)
end

#sanitize_xml(xml) ⇒ Object



201
202
203
204
205
206
207
208
209
210
211
212
# File 'lib/oai/client.rb', line 201

def sanitize_xml(xml)
  xml = strip_invalid_utf_8_chars(xml)
  xml = strip_invalid_xml_chars(xml)
  if @parser == 'libxml'
    # remove default namespace for oai-pmh since libxml
    # isn't able to use our xpaths to get at them
    # if you know a way around thins please let me know
    xml = xml.gsub(
      /xmlns=\"http:\/\/www.openarchives.org\/OAI\/.\..\/\"/, '')
  end
  xml
end