Class: OAI::Client
- Inherits:
-
Object
- Object
- OAI::Client
- Defined in:
- lib/oai/client.rb
Overview
A OAI::Client provides a client api for issuing OAI-PMH verbs against a OAI-PMH server. The 6 OAI-PMH verbs translate directly to methods you can call on a OAI::Client object. Verb arguments are passed as a hash:
“‘ruby
client = OAI::Client.new 'http://www.pubmedcentral.gov/oai/oai.cgi'
record = client.get_record :identifier => 'oai:pubmedcentral.gov:13901'
for identifier in client.list_identifiers
puts identifier
end
“‘
It is worth noting that the API uses methods and parameter names with underscores in them rather than studly caps. So above list_identifiers and metadata_prefix are used instead of the listIdentifiers and metadataPrefix used in the OAI-PMH specification.
Also, the from and until arguments which specify dates should be passed in as Date or DateTime objects depending on the granularity supported by the server.
For detailed information on the arguments that can be used please consult the OAI-PMH docs at <www.openarchives.org/OAI/openarchivesprotocol.html>.
Constant Summary collapse
- UNESCAPED_AMPERSAND =
/&(?!(?:amp|lt|gt|quot|apos|\#\d+);)/
Instance Method Summary collapse
-
#get_record(opts = {}) ⇒ Object
Equivalent to a
GetRecordrequest. -
#identify ⇒ Object
Equivalent to a
Identifyrequest. -
#initialize(base_url, options = {}) ⇒ Client
constructor
The constructor which must be passed a valid base url for an oai service:.
-
#list_identifiers(opts = {}) ⇒ Object
Equivalent to a
ListIdentifiersrequest. -
#list_metadata_formats(opts = {}) ⇒ Object
Equivalent to a
ListMetadataFormatsrequest. -
#list_records(opts = {}) ⇒ Object
Equivalent to the
ListRecordsrequest. -
#list_sets(opts = {}) ⇒ Object
Equivalent to the
ListSetsrequest. - #sanitize_xml(xml) ⇒ Object
Constructor Details
#initialize(base_url, options = {}) ⇒ Client
The constructor which must be passed a valid base url for an oai service:
client = OAI::Client.new 'http://www.pubmedcentral.gov/oai/oai.cgi'
If you want to see debugging messages on STDERR use:
client = OAI::Client.new 'http://example.com', :debug => true
By default OAI verbs called on the client will return REXML::Element objects for metadata records, however if you wish you can use the :parser option to indicate you want to use libxml instead, and get back XML::Node objects
client = OAI::Client.new 'http://example.com', :parser => 'libxml'
You can configure the Faraday HTTP client by providing an alternate Faraday instance:
“‘ruby client = OAI::Client.new ’example.com’, :http => Faraday.new {|c|} “‘
### HIGH PERFORMANCE
If you want to supercharge this api install ‘libxml-ruby >= 0.3.8` and use the :parser option when you construct your OAI::Client.
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
# File 'lib/oai/client.rb', line 86 def initialize(base_url, ={}) @base = URI.parse base_url @debug = .fetch(:debug, false) @parser = .fetch(:parser, 'rexml') @headers = .fetch(:headers, {}) @http_client = .fetch(:http) do Faraday.new(:url => @base.clone) do |builder| follow_redirects = .fetch(:redirects, true) follow_redirects = 5 if follow_redirects == true if follow_redirects require 'faraday/follow_redirects' builder.use Faraday::FollowRedirects::Middleware builder.response :follow_redirects, :limit => follow_redirects.to_i end builder.adapter :net_http end end # load appropriate parser case @parser when 'libxml' begin require 'rubygems' require 'xml/libxml' rescue raise OAI::Exception.new("xml/libxml not available") end when 'rexml' require 'rexml/document' require 'rexml/xpath' else raise OAI::Exception.new("unknown parser: #{@parser}") end end |
Instance Method Details
#get_record(opts = {}) ⇒ Object
Equivalent to a GetRecord request. You must supply an :identifier argument. You should get back a OAI::GetRecordResponse object which you can extract a OAI::Record object from.
155 156 157 |
# File 'lib/oai/client.rb', line 155 def get_record(opts={}) OAI::GetRecordResponse.new(do_request('GetRecord', opts)) end |
#identify ⇒ Object
Equivalent to a Identify request. You’ll get back a OAI::IdentifyResponse object which is essentially just a wrapper around a REXML::Document for the response. If you created your client using the libxml parser then you will get an XML::Node object instead.
128 129 130 |
# File 'lib/oai/client.rb', line 128 def identify OAI::IdentifyResponse.new(do_request('Identify')) end |
#list_identifiers(opts = {}) ⇒ Object
Equivalent to a ListIdentifiers request. Pass in :from, :until arguments as Date or DateTime objects as appropriate depending on the granularity supported by the server.
You can use seamless resumption with this verb, which allows you to mitigate (to some extent) the lack of a Count verb:
client.list_identifiers.full.count # Don't try this on PubMed though!
148 149 150 |
# File 'lib/oai/client.rb', line 148 def list_identifiers(opts={}) do_resumable(OAI::ListIdentifiersResponse, 'ListIdentifiers', opts) end |
#list_metadata_formats(opts = {}) ⇒ Object
Equivalent to a ListMetadataFormats request. A ListMetadataFormatsResponse object is returned to you.
135 136 137 |
# File 'lib/oai/client.rb', line 135 def (opts={}) OAI::ListMetadataFormatsResponse.new(do_request('ListMetadataFormats', opts)) end |
#list_records(opts = {}) ⇒ Object
Equivalent to the ListRecords request. A ListRecordsResponse will be returned which you can use to iterate through records
response = client.list_records
response.each do |record|
puts record.
end
Alternately, you can use seamless resumption to avoid handling resumption tokens:
client.list_records.full.each do |record|
puts record.
end
### Memory Use :full will avoid storing more than one page of records in memory, but your use it in ways that override that behaviour. Be careful to avoid using client.list_records.full.entries unless you really want to hold all the records in the feed in memory!
179 180 181 |
# File 'lib/oai/client.rb', line 179 def list_records(opts={}) do_resumable(OAI::ListRecordsResponse, 'ListRecords', opts) end |
#list_sets(opts = {}) ⇒ Object
Equivalent to the ListSets request. A ListSetsResponse object will be returned which you can use for iterating through the OAI::Set objects
for set in client.list_sets
puts set
end
A large number of sets is not unusual for some OAI-PMH feeds, so using seamless resumption may be preferable:
client.list_sets.full.each do |set|
puts set
end
197 198 199 |
# File 'lib/oai/client.rb', line 197 def list_sets(opts={}) do_resumable(OAI::ListSetsResponse, 'ListSets', opts) end |
#sanitize_xml(xml) ⇒ Object
201 202 203 204 205 206 207 208 209 210 211 212 |
# File 'lib/oai/client.rb', line 201 def sanitize_xml(xml) xml = strip_invalid_utf_8_chars(xml) xml = strip_invalid_xml_chars(xml) if @parser == 'libxml' # remove default namespace for oai-pmh since libxml # isn't able to use our xpaths to get at them # if you know a way around thins please let me know xml = xml.gsub( /xmlns=\"http:\/\/www.openarchives.org\/OAI\/.\..\/\"/, '') end xml end |