Class: Scopus2

Inherits:
Service show all
Includes:
ActionView::Helpers::SanitizeHelper, MetadataHelper, UmlautHttp
Defined in:
app/service_adaptors/scopus2.rb

Overview

Service adapter plug-in.

PURPOSE: Includes “cited by”, “similar articles” and “more by these authors” links from scopus.

LIMTATIONS: You must be a Scopus customer for these links generated to work for your users at all! Off-campus users should be probably going through ezproxy, see the EZProxy plug-in. Must find a match in scopus, naturally. “cited by” will only be included if Scopus has non-0 “cited by” links. But there's no good way to precheck similar/more-by for this, so they are provided blind and may result in 0 hits. You can turn them off if you like, with @include_similar, and @include_more_by_authors. Abstracts are not used because it seems to violate Scopus terms of service to use them.

REGISTERING: Register for a Scopus API key at: www.developers.elsevier.com/action/devprojects?pageOrigin=cmsPage&zone=topNavBar Look for “Register a new site” button at the bottom right of the page.

For the second Scopus API, you theoretically need a Scopus “PartnerID” and corresponding “release number”, in @partner_id and @scopus_release There's no real easy way to get one. Scopus says:

"To obtain a partner ID or release number, contact your nearest regional
Scopus office. A list of Scopus contacts is available at
http://www.info.scopus.com/contactus/index.shtml"

Bah! But fortunately, using the “partnerID” assigned to the Scopus Json API, 65, seems to work, and is coded here as the default. You could try going with that. When you register a partnerID, you also get a 'salt key', which is currently not used by this code, but @link_salt_key is reserved for it in case added functionality does later.

SCOPUS USEFUL URLS:

api key register: www.developers.elsevier.com/action/devprojects?pageOrigin=cmsPage&zone=topNavBar

'content policies' terms of use: www.developers.elsevier.com/cms/content-apis

API overview docs: www.developers.elsevier.com/cms/content-apis

Various other api docs? Confused myself as to organization here.

Some API recommendations for federated search: www.developers.elsevier.com/cms/restful-api-federated-search

Constant Summary

Constants inherited from Service

Service::LinkOutFilterTask, Service::StandardTask

Instance Attribute Summary collapse

Attributes inherited from Service

#group, #name, #priority, #request, #service_id, #status, #task, #url

Instance Method Summary collapse

Methods included from UmlautHttp

#http_fetch, #proxy_like_headers

Methods included from MetadataHelper

#get_doi, #get_epage, #get_gpo_item_nums, #get_identifier, #get_isbn, #get_issn, #get_lccn, #get_month, #get_oclcnum, #get_pmid, #get_search_creator, #get_search_terms, #get_search_title, #get_spage, #get_sudoc, #get_top_level_creator, #get_year, #normalize_lccn, #normalize_title, #raw_search_title, title_is_serial?

Methods included from MarcHelper

#add_856_links, #edition_statement, #get_title, #get_years, #gmd_values, #service_type_for_856, #should_skip_856_link?, #strip_gmd

Methods inherited from Service

#credits, #display_name, #handle_wrapper, #link_out_filter, #preempted_by, required_config_params, #response_url, #translate

Constructor Details

#initialize(config) ⇒ Scopus2

Returns a new instance of Scopus2


73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
# File 'app/service_adaptors/scopus2.rb', line 73

def initialize(config)
  #defaults
  @display_name = "Scopus"
  @registered_referer
  @scopus_search_base = 'http://api.elsevier.com/content/search/index:SCOPUS'
  
  @include_cited_by = true
  @include_similar = true
  @include_more_by_authors = true
  @more_by_authors_type = "similar"

  @inward_cited_by_url = "http://www.scopus.com/scopus/inward/citedby.url"
  #@partner_id = "E5wmcMWC"
  @partner_id = 65
  @link_salt_key = nil
  @scopus_release = "R6.0.0"

  # Scopus offers two algorithms for finding similar items.
  # This variable can be:
  # "key" => keyword based similarity 
  # "ref" => reference based similiarity (cites similar refs?) Seems to offer 0 hits quite often, so we use keyword instead. 
  # "aut" => author. More docs by same authors. Incorporated as seperate link usually. 
  @more_like_this_type = "key"
  @inward_more_like_url = "http://www.scopus.com/scopus/inward/mlt.url"
  
  @credits = {
    @display_name => "http://www.scopus.com/home.url"
  }
  
  super(config)
end

Instance Attribute Details

#scopus_search_baseObject

Returns the value of attribute scopus_search_base


61
62
63
# File 'app/service_adaptors/scopus2.rb', line 61

def scopus_search_base
  @scopus_search_base
end

Instance Method Details

#check_for_hits(url) ⇒ Object

NOT currently working. Scopus doesn't make this easy. Takes a scopus direct url for which we're not sure if there will be results or not, and requests it and html screen-scrapes to get hit count. (We can conveniently find this just in the html <title> at least). Works for cited_by and more_like_this searches at present. May break if Scopus changes their html title!


311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
# File 'app/service_adaptors/scopus2.rb', line 311

def check_for_hits(url)

  response = http_fetch(url).body

  response_html = Nokogiri::HTML(response)

  title = response_xml.at('title').inner_text
  # title is "X documents" (or 'Documents') if there are hits.
  # It's annoyingly "Search Error" if there are either 0 hits, or
  # if there was an actual error. So we can't easily log actual
  # errors, sorry.
  title.downcase =~ /^\s*(\d+)?\s+document/
  if ( hits = $1)
    return hits.to_i
  else
    return 0
  end    
end

#cited_by_url(result) ⇒ Object


290
291
292
293
294
295
296
# File 'app/service_adaptors/scopus2.rb', line 290

def cited_by_url(result)
  eid = CGI.escape( eid_from_hit(result) )    
  #return "#{@scopus_cited_by_base}?eid=#{eid}&src=s&origin=recordpage"
  # Use the new scopus direct link format!
  return "#{@inward_cited_by_url}?partnerID=#{@partner_id}&rel=#{@scopus_release}&eid=#{eid}"
  return 
end

#eid_from_hit(result) ⇒ Object


286
287
288
# File 'app/service_adaptors/scopus2.rb', line 286

def eid_from_hit(result)
  result.at_xpath("atom:eid/text()", xml_namespaces).to_s
end

#handle(request) ⇒ Object


114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
# File 'app/service_adaptors/scopus2.rb', line 114

def handle(request)
  scopus_query = scopus_query(request)

  # we can't make a good query, nevermind. 
  return request.dispatched(self, true) if scopus_query.blank? 
  
  url = scopus_url(scopus_query)
  
  
  # Make the call.
  headers = {"Accept" => "application/xml"}
  headers["Referer"] = @registered_referer if @registered_referer 

  response = http_fetch(url, :headers => headers, :raise_on_http_error_code => false)

  unless response.kind_of? Net::HTTPSuccess
    # error, sometimes we have info in XML <service-error>
    xml = begin
      Nokogiri::XML(response.body)
    rescue Exception
      nil
    end
  
    code, message = nil, nil
    if xml && error = xml.at_xpath("./service-error")
      code    = error.at_xpath("./status/statusCode")
      message = error.at_xpath("./status/statusText")
    end

    e = StandardError.new("Scopus returned error HTTP status #{response.code}: #{code}: #{message}: scopus query: #{url}")
    return request.dispatched(self, DispatchedService::FailedFatal, e)
  end

  xml      = Nokogiri::XML(response.body)
  
  # Take the first hit from scopus's results, hope they relevancy ranked it
  # well. For DOI/pmid search, there should ordinarly be only one hit!    
  first_hit = xml.at_xpath("//atom:entry[1]", xml_namespaces)

  # Weirdly, a zero-hit result has one <atom:entry> containing an
  # <atom:error> (Sic). Could other kinds of errors be reported that
  # way too? Maybe. Better check just in case, ugh. 
  if first_hit && (error = first_hit.at_xpath("./atom:error", xml_namespaces))      
      scopus_message = error.text

      if scopus_message == "Result set was empty"
        # Just zero hits, no big deal, but nothing to do. 
        return request.dispatched(self, true)
      else
        # real error, log it. 
        e = StandardError.new("Scopus returned error: #{error.text}: scopus query: #{url}")
        return request.dispatched(self, DispatchedService::FailedFatal, e)
      end
  end 

  if first_hit
    if first_hit && (error = first_hit.at_xpath("./atom:error", xml_namespaces))      
      e = StandardError.new("Scopus returned error: #{error.text}")
      return request.dispatched(self, DispatchedService::FailedFatal, e)
    end 
  
    if (@include_cited_by)
      try_add_cited_by_response(first_hit, request)
    end

    if (@include_similar)
      url = more_like_this_url(first_hit)
      # Pre-checking for actual hits not currently working, disabled.
      if (true || ( hits = check_for_hits(url) ) > 0 )
        request.add_service_response( 
          :service=>self, 
          :display_text => "#{hits} #{ServiceTypeValue[:similar].display_name_pluralize.downcase.capitalize}", 
          :url => url, 
          :service_type_value => :similar)          
      end                
    end

    if ( @include_more_by_authors)
      url = more_like_this_url(first_hit, :type => "aut")
      # Pre-checking for actual hits not currently working, disabled. 
      if (true || ( hits = check_for_hits(url) ) > 0 )
        request.add_service_response( 
          :service=>self, 
          :display_text => "#{hits} More from these authors", 
          :url => url, 
          :service_type_value => :similar)          
      end        
    end

  end

  return request.dispatched(self, true)
end

#more_like_this_url(result, options = {}) ⇒ Object


298
299
300
301
302
303
# File 'app/service_adaptors/scopus2.rb', line 298

def more_like_this_url(result, options = {})
  options[:type] ||= @more_like_this_type
  eid = CGI.escape eid_from_hit(result)

  return "#{@inward_more_like_url}?partnerID=#{@partner_id}&rel=#{@scopus_release}&eid=#{eid}&mltType=#{options[:type]}"
end

#phrase(str) ⇒ Object

backslash escapes any double quotes, and embeds string in scopus phrase search double quotes. Does NOT uri-escape.


256
257
258
# File 'app/service_adaptors/scopus2.rb', line 256

def phrase(str)
  '"' + str.gsub('"', '\\"') + '"'
end

#scopus_query(request) ⇒ Object

Returns a scopus advanced search query intended to find the exact known item identified by this citation.

NOT uri-escaped yet, make sure to uri-escape before putting it in a uri param!

Will try to use DOI or PMID if available. Otherwise will use issn/year/vol/iss/start page if available. In some cases may resort to author/title.


218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
# File 'app/service_adaptors/scopus2.rb', line 218

def scopus_query(request)
  
  if (doi = get_doi(request.referent))
    return "DOI(#{phrase(doi)})"
  elsif (pmid = get_pmid(request.referent))
    return "PMID(#{phrase(pmid)})"
  elsif (isbn = get_isbn(request.referent))
    # I don't think scopus has a lot of ISBN-holding citations, but
    # it allows search so we might as well try. 
    return "ISBN(#{phrase(isbn)})"
  else            
    # Okay, we're going to try to do it on issn/vol/issue/page.
    # If we don't have issn, we'll reluctantly use journal title
    # (damn you google scholar).
     = request.referent.
    issn = request.referent.issn
    if ( (issn || ! ['jtitle'].blank? ) &&
         ! ['volume'].blank? &&
         ! ['issue'].blank? &&
         ! ['spage'].blank? )
      query = "VOLUME(#{phrase(metadata['volume'])}) AND ISSUE(#{phrase(metadata['issue'])}) AND PAGEFIRST(#{phrase(metadata['spage'])}) "
      if ( issn )
        query += " AND (ISSN(#{phrase(issn)}) OR EISSN(#{phrase(issn)}))"
      else
        query += " AND EXACTSRCTITLE(#{phrase(metadata['jtitle'])})"
      end
      return query
    end      
  end
  return nil
end

#scopus_url(query) ⇒ Object


250
251
252
# File 'app/service_adaptors/scopus2.rb', line 250

def scopus_url(query)
  "#{@scopus_search_base}?apiKey=#{CGI.escape @api_key}&query=#{CGI.escape query}"
end

#service_types_generatedObject


63
64
65
66
67
68
69
70
71
# File 'app/service_adaptors/scopus2.rb', line 63

def service_types_generated
  types = []
  types.push( ServiceTypeValue[:cited_by] ) if @include_cited_by
  types.push( ServiceTypeValue[:abstract] ) if @include_abstract
  types.push( ServiceTypeValue[:similar] ) if @include_similar
  types.push( ServiceTypeValue[@more_by_authors_type] ) if @include_more_by_authors

  return types
end

#try_add_cited_by_response(result, request) ⇒ Object

Input is a ruby hash that came from the scopus JSON, representing a single hit. We're going to add this as a result.


262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
# File 'app/service_adaptors/scopus2.rb', line 262

def try_add_cited_by_response(result, request)
  # While scopus provides an "inwardurl" in the results, this just takes
  # us to the record detail page. We actually want to go RIGHT to the
  # list of cited-by items. So we create our own, based on Scopus's
  # reversed engineered predictable URLs. 

  count_str = result.at_xpath("atom:citedby-count/text()", xml_namespaces).to_s
  count_i   = count_str.to_i

  return if count_i < 1
  
  label = ServiceTypeValue[:cited_by].display_name_pluralize.downcase.capitalize    
    if count_i == 1
      label = ServiceTypeValue[:cited_by].display_name.downcase.capitalize
    end
  cited_by_url = cited_by_url( result )
  
  request.add_service_response(:service=>self, 
    :display_text => "#{count_str} #{label}", 
    :count=> count_i, 
    :url => cited_by_url, 
    :service_type_value => :cited_by)
end

#xml_namespacesObject


105
106
107
108
109
110
111
112
# File 'app/service_adaptors/scopus2.rb', line 105

def xml_namespaces
  @xml_namespaces ||= 
    { "atom"        => "http://www.w3.org/2005/Atom",
      "dc"          => "http://purl.org/dc/elements/1.1/",
      "opensearch"  => "http://a9.com/-/spec/opensearch/1.1/",
      "prism"       => "http://prismstandard.org/namespaces/basic/2.0/"
    }
end