Class: BentoSearch::SummonEngine

Inherits:
Object
  • Object
show all
Extended by:
HTTPClientPatch::IncludeClient
Includes:
ActionView::Helpers::OutputSafetyHelper, SearchEngine
Defined in:
app/search_engines/bento_search/summon_engine.rb

Overview

Functionality notes

  • for pagination, underlying summon API only supports ‘page’, not ‘start’ style, if you pass in ‘start’ style it will be ‘rounded’ to containing ‘page’.

Required config params

access_id

supplied by SerSol for your account

secret_key

supplied by SerSol for your account

Optional custom config params

fixed_params

Fixed SerSol query param literals to send with every search. Value is a HASH, of keys and either single values or arrays of values. For instance, to exclude certain content types from all search results, in config:

:fixed_params =>
  {"s.fvf" => ["ContentType,Web Resource,true","ContentType,Reference,true","ContentType,eBook,true"] }

Note that values are NOT URI escaped in config, code will take care of that for you. You could also fix “s.role” to ‘authenticated’ using this mechanism, if you restrict all access to your app to authenticated affiliated users.

highlighting

Default true, ask SerSol for query-in-context highlighting in title and snippets field. If true you WILL get HTML with <b> tags in your titles – and snippets available in ResultItems. They will be used by stnadard display logic as summary unless you set for_display.prefer_abstract_as_summary = true

use_summon_openurl

default false. If true, will use OpenURL kev context

object passed back by summon to generate openurls, instead of creating
one ourself from individual data elements. summon openurl is decent,
but currently includes highlighting tags in title elements. Also note
it includes DC-type openurls, which we don't currently generate ourselves.
lang

Sent to summon as “s.l” param, see api.summon.serialssolutions.com/help/api/search/parameters/language default nil. You may want to set to “en”.

Custom search params

:auth

Pass in ‘:auth => true` (or “true”) to send headers to summon

indicating an authorized user, for full search results.

:summon_params

Hash of key/value pairs to pass directly to summon. Just like

fixed_params in configuration, but per-search. Can be
used to directly trigger functionality not covered by
the bento_search adapter. Values can be arrays where summon
keys are repeatable.
:peer_reviewed_only

Set to boolean true or string ‘true’, to restrict results to peer-reviewed only (as identified by Summon)

:online_only

Limit to only items marked ‘with fulltext online’ by Summon. Just a convenience shortcut for setting s.fvf including IsFullText,true

:pubyear_start
:pubyear_end

Date range limiting, pass in custom search args, one or both of pubyear_start and pubyear_end #to_i will be called on it, so can be string. .search(:query => “foo”, :pubyear_start => 2000)

Custom response data

Tech notes

We did not choose to use the summon ruby gem in general, we wanted more control than it offered (ability to use HTTPClient persistent connections, MultiJson for json parsing, etc).

However, we DO use that gem specifically for constructing authentication headers how summon wants it, see class at github.com/summon/summon.rb/blob/master/lib/summon/transport/headers.rb

Language provided only in language_str not language_code, all that API gives us. We could try to reverse lookup from ISO code labels later if we want.

Constant Summary collapse

HttpTimeout =

Can’t change http timeout in config, because we keep an http client at class-wide level, and config is not class-wide. Change this ‘constant’ if you want to change it, I guess.

Summon is pretty fast, we think a 4.5 second timeout should be be plenty. May be adjusted from experience.

4.5
@@hl_start_token =

Originally we used $$BENTO_HL_START$$ etc with dollar signs, but the dollar signs trigger a weird bug in summon where end tokens are missing from output.

"__BENTO_HL_START__"
@@hl_end_token =
"__BENTO_HL_END__"

Constants included from SearchEngine

BentoSearch::SearchEngine::DefaultPerPage

Class Method Summary collapse

Instance Method Summary collapse

Methods included from HTTPClientPatch::IncludeClient

include_http_client

Methods included from SearchEngine

#fill_in_search_metadata_for, #initialize, #normalized_search_arguments, #public_settable_search_args, #search

Methods included from BentoSearch::SearchEngine::Capabilities

#search_keys, #semantic_search_keys, #semantic_search_map, #sort_keys

Class Method Details

.default_configurationObject



431
432
433
434
435
436
437
# File 'app/search_engines/bento_search/summon_engine.rb', line 431

def self.default_configuration
  {
    :base_url => "http://api.summon.serialssolutions.com/2.0.0/search",
    :highlighting => true,
    :use_summon_openurl => false
  }
end

.required_configurationObject



427
428
429
# File 'app/search_engines/bento_search/summon_engine.rb', line 427

def self.required_configuration
  [:access_id, :secret_key]
end

Instance Method Details

#construct_request(args) ⇒ Object

returns two element array: [uri, headers]

uri, headers = construct_request(args)



285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
# File 'app/search_engines/bento_search/summon_engine.rb', line 285

def construct_request(args)
  # Query params in a hash with array values, becuase easiest
  # to generate auth headers that way. Value is array of values that
  # are NOT URI-encoded yet.
  query_params = Hash.new {|h, k| h[k] = [] }

  # Add in fixed_params from config, and summon_params from search, if any.

  direct_params = (configuration.fixed_params || {}).merge( args[:summon_params] || {} )

  direct_params.each_pair do |key, value|
    [value].flatten.each do |v|
      query_params[key] << v
    end
  end

  if args[:per_page]
    query_params["s.ps"] = args[:per_page]
  end
  if args[:page]
    query_params["s.pn"] = args[:page]
  end

  if args[:search_field]
    query_params['s.q'] = "#{args[:search_field]}:(#{summon_escape(args[:query])})"
  else
    query_params['s.q'] = summon_escape( args[:query] )
  end

  if (args[:sort] &&
      (defn = self.sort_definitions[args[:sort]]) &&
      (literal = defn[:implementation]))
    query_params['s.sort'] =  literal
  end

  if args[:auth] == true
    query_params['s.role'] = "authenticated"
  end

  if [true, "true"].include? args[:peer_reviewed_only]
    query_params['s.fvf'] ||= []
    query_params['s.fvf'] << "IsPeerReviewed,true"
  end

  if [true, "true"].include? args[:online_only]
    query_params["s.fvf"] ||= []
    query_params['s.fvf'] << "IsFullText,true"
  end

  # Summon uses "*" for open ended range endpoint
  if args[:pubyear_start] || args[:pubyear_end]
    from = args[:pubyear_start].to_i
    from = "*" if from == 0

    to = args[:pubyear_end].to_i
    to = "*" if to == 0

    query_params["s.rf"] ||= []
    query_params["s.rf"] << "PublicationDate,#{from}:#{to}"
  end

  if configuration.highlighting
    query_params['s.hs'] = @@hl_start_token
    query_params['s.he'] = @@hl_end_token
  else
    query_params['s.hl'] = "false"
  end

  if configuration.lang
    query_params["s.l"] = configuration.lang
  end


  headers = Summon::Transport::Headers.new(
    :access_id => configuration.access_id,
    :secret_key => configuration.secret_key,
    :accept => "json",
    :params => query_params,
    :url => configuration.base_url
    )


  query_string = query_params.keys.collect do |key|
    [query_params[key]].flatten.collect do |value|
      "#{CGI.escape(key.to_s)}=#{CGI.escape(value.to_s)}"
    end
  end.flatten.join("&")

  uri = "#{configuration.base_url}?#{query_string}"

  return [uri, headers]
end

#first_if_present(array) ⇒ Object



243
244
245
# File 'app/search_engines/bento_search/summon_engine.rb', line 243

def first_if_present(array)
  array ? array.first : nil
end

#format_title(doc_hash) ⇒ Object

combine title and subtitle into one title,



416
417
418
419
420
421
422
423
424
425
# File 'app/search_engines/bento_search/summon_engine.rb', line 416

def format_title(doc_hash)
  title          = first_if_present  doc_hash["Title"]
  subtitle       = first_if_present doc_hash["Subtitle"]

  if subtitle.present?
    title = "#{title}: #{subtitle}"
  end

  return handle_highlighting( title )
end

#get(id) ⇒ Object

Looks up a record by SerSol Id, engine.get( item.unique_id ) should return item.

Returns a single BentoSearch::ResultItem, or raises BentoSearch::NotFound, BentoSearch::TooManyFound, or other unspecified exception.



231
232
233
234
235
236
237
238
239
240
241
# File 'app/search_engines/bento_search/summon_engine.rb', line 231

def get(id)
  # "ID" is an internal search field for Summon, not listed in our
  # own search_field_definitions, but it works.
  results = search(id, :search_field => "ID")

  raise BentoSearch::NotFound.new("ID: #{id}") if results.length == 0
  raise BentoSearch::TooManyFound.new("ID: #{id}") if results.length == 0
  raise (results.error[:exception] || Exception.new(error.inspect)) if results.failed?

  return results.first
end

#handle_highlighting(str, options = {}) ⇒ Object

If summon has put snippet highlighting tokens in a field, we need to HTML escape the literal values, while still using the highlighting tokens to put HTML tags around highlighted terms.



405
406
407
408
409
410
411
412
413
# File 'app/search_engines/bento_search/summon_engine.rb', line 405

def handle_highlighting( str, options = {} )
  BentoSearch::Util.handle_highlight_tags(
    str,
    :start_tag => @@hl_start_token,
    :end_tag => @@hl_end_token,
    :enabled => configuration.highlighting,
    :strip => options[:strip]
    )
end

#max_per_pageObject



439
440
441
# File 'app/search_engines/bento_search/summon_engine.rb', line 439

def max_per_page
  200
end

#name_normalize(str) ⇒ Object



270
271
272
273
274
275
276
277
278
279
# File 'app/search_engines/bento_search/summon_engine.rb', line 270

def name_normalize(str)

  return nil if str.blank?

  str = str.strip

  return nil if str.blank? || str =~ /^[,:.]*$/

  return str
end

#normalize_content_type(summon_type) ⇒ Object

Normalize Summon Content-Type to our standardized list.

This ends up losing useful distinctions Summon makes, however.



252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
# File 'app/search_engines/bento_search/summon_engine.rb', line 252

def normalize_content_type(summon_type)
  case summon_type
  when "Journal Article", "Book Review", "Trade Publication Article", "Newspaper Article" then "Article"
  when "Audio Recording", "Music Recording", "Spoken Word Recording" then "AudioObject"
  when "Book", "eBook", "Book / eBook" then "Book"
  when "Book Chapter" then :book_item
  when "Conference Proceeding" then :conference_paper
  when "Dissertation" then :dissertation
  when "Journal", "Newsletter", "Newspaper" then :serial
  when "Photograph" then "Photograph"
  when "Report", "Technical Report"   then "Report"
  when "Video Recording", "Film" then "VideoObject"
  when "Web Resource" then "WebPage"
  when "Computer File", "Data Set" then "SoftwareApplication"
  else nil
  end
end

#search_field_definitionsObject

Summon offers many more search fields than this. This is a subset listed here. See api.summon.serialssolutions.com/help/api/search/fields although those docs may not be up to date.

The AuthorCombined, TitleCombined, and SubjectCombined indexes aren’t even listed in the docs, but they are real. I think.



461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
# File 'app/search_engines/bento_search/summon_engine.rb', line 461

def search_field_definitions
    {
      nil                   => {:semantic => :general},
      "AuthorCombined"      => {:semantic => :author},
      "TitleCombined"       => {:semantic => :title},
      # SubjectTerms does not include TemporalSubjectTerms
      # or Keywords, sorry.
      "SubjectTerms"        => {:semantic => :subject},
      # ISBN and ISSN do not include seperate EISSN and EISBN
      # fields, sorry.
      "ISBN"                => {:semantic => :isbn},
      "ISSN"                => {:semantic => :issn},
      "OCLC"                => {:semantic => :oclcnum},
      "PublicationSeriesTitle" => {:semantic => :publication_title }
    }
end

#search_implementation(args) ⇒ Object



124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
# File 'app/search_engines/bento_search/summon_engine.rb', line 124

def search_implementation(args)
  uri, headers = construct_request(args)

  Rails.logger.debug("SummonEngine request URL: #{uri}")

  results = BentoSearch::Results.new

  hash, response, exception = nil
  begin
    response = http_client.get(uri, nil, headers)
    hash = MultiJson.load( response.body )
  rescue TimeoutError, HTTPClient::ConfigurationError, HTTPClient::BadResponseError, MultiJson::DecodeError, Nokogiri::SyntaxError => e
    exception = e
  end
  # handle some errors
  if (response.nil? || hash.nil? || exception ||
    (! HTTP::Status.successful? response.status))
    results.error ||= {}
    results.error[:exception] = e
    results.error[:status] = response.status if response

    return results
  end

  results.total_items = hash["recordCount"]

  hash["documents"].each do |doc_hash|
    item = BentoSearch::ResultItem.new

    item.custom_data["summon.original_data"] = doc_hash

    item.unique_id      = first_if_present doc_hash["ID"]

    item.title          = format_title(doc_hash)

    item.link           = doc_hash["link"]
    # Don't understand difference between hasFullText and
    # isFullTextHit. ??. We'll use hasFullText for now, that's
    # the documented one.
    item.link_is_fulltext = doc_hash["hasFullText"]

    if configuration.use_summon_openurl
      item.openurl_kev_co = doc_hash["openUrl"] # Summon conveniently gives us pre-made OpenURL
    end

    item.journal_title  = first_if_present doc_hash["PublicationTitle"]
    item.issn           = first_if_present doc_hash["ISSN"]
    item.isbn           = first_if_present doc_hash["ISBN"]
    item.doi            = first_if_present doc_hash["DOI"]

    item.start_page     = first_if_present doc_hash["StartPage"]
    item.end_page       = first_if_present doc_hash["EndPage"]

    if (pubdate = first_if_present doc_hash["PublicationDate_xml"])
      item.year         = pubdate["year"]
    end
    item.volume         = first_if_present doc_hash["Volume"]
    item.issue          = first_if_present doc_hash["Issue"]

    if (pub = first_if_present doc_hash["Publisher_xml"])
      item.publisher    = pub["name"]
    end

    # if it's a dissertation, put the school in the 'publisher' field.
    # if we don't have one otherwise.
    if (! item.publisher) && (school = first_if_present doc_hash["DissertationSchool_xml"])
      item.publisher    = school["name"]
    end

    (doc_hash["Author_xml"] || []).each do |auth_hash|
      a = BentoSearch::Author.new

      a.first           = name_normalize auth_hash["givenname"]
      a.last            = name_normalize auth_hash["surname"]
      a.middle          = name_normalize auth_hash["middlename"]

      a.display         = name_normalize auth_hash["fullname"]

      item.authors << a unless a.empty?
    end

    item.format         = normalize_content_type( first_if_present doc_hash["ContentType"] )
    if doc_hash["ContentType"]
      item.format_str     = doc_hash["ContentType"].join(", ")
    end

    item.language_str   = first_if_present doc_hash["Language"]

    item.abstract       = first_if_present doc_hash["Abstract"]

    # Just straight snippets
    if doc_hash["Snippet"]
      item.snippets = doc_hash["Snippet"].collect {|s| handle_highlighting(s)}
    end

    results << item
  end


  return results
end

#sort_definitionsObject

Summon actually only supports relevancy sort, and pub year asc or desc. we just expose relevance and pub year desc here.



445
446
447
448
449
450
451
452
453
# File 'app/search_engines/bento_search/summon_engine.rb', line 445

def sort_definitions
  # implementation includes literal sersol value, but not yet
  # uri escaped, that'll happen at a later code point.
  {
    "relevance" => {:implementation => nil}, # default
    "date_desc" => {:implementation => "PublicationDate:desc"}

  }
end

#summon_escape(string) ⇒ Object

Escapes special chars for Summon. Not entirely clear what we have to escape where (or double escape sometimes?), but we’re just going to do a straight backslash escape of special chars.

Does NOT do URI-escaping, that’s a different step.



385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
# File 'app/search_engines/bento_search/summon_engine.rb', line 385

def summon_escape(string)
  # replace with backslash followed by original matched thing,
  # need to double backslash for ruby string literal makes
  # this ridiculously confusing, sorry. Block form of gsub
  # is the only thing that keeps it from being impossible.
  #
  # Do NOT escape double quotes, let people use them for
  # phrases!
  #
  # While docs suggest you have to double-slash escape hyphens,
  # in fact doing so ruins ID: search, so we don't.
  string.gsub(/([+&|!\(\){}\[\]^~*?\\:])/) do |match|
    "\\#{$1}"
  end
end