Class: GoogleBookSearch

Inherits:
Service show all
Includes:
MetadataHelper, UmlautHttp
Defined in:
app/service_adaptors/google_book_search.rb

Overview

Service that searches Google Book Search to determine viewability. It searches by ISBN, OCLCNUM and/or LCCN.

Uses Google Books API, code.google.com/apis/books/docs/v1/getting_started.html code.google.com/apis/books/docs/v1/using.html

If a full view is available it returns a fulltext service response. If partial view is available, return as “limited experts”. If no view at all, still includes a link in highlighted_links, to pay

lip service to google branding requirements.

Unfortunately there is no way tell which of the noview books provide search, although some do – search is advertised if full or partial view is available.

If a thumbnail_url is returned in the responses, a cover image is displayed.

Can also enhances with an abstract, if available. – off by default, set `abstract: true` to turn on.

And fleshes out bibliographic details from an identifier – if all you had was an ISBN, will fill in title, author, etc in referent from GBS response.

Google API Key

Setting an api key in :api_key STRONGLY recommended, or you'll probably get rate limited (not clear what the limit is with no api key supplied). You may have to ask for higher rate limit for your api key than the default 1000/day, which you can do through the google api console: code.google.com/apis/console

I requested 50k with this message, and was quickly approved with no questions “Services for academic library (Johns Hopkins Libraries) web applications to match Google Books availability to items presented by our catalog, OpenURL link resolver, and other software. ”

Recommend setting your 'per user limit' to something crazy high, as well as requesting more quota.

Constant Summary

ViewFullValue =

Identifiers used in API response to indicate viewability level

'ALL_PAGES'
ViewPartialValue =
'PARTIAL'
ViewNoneValue =

None might also be 'snippet', but Google doesn't want to distinguish

'NO_PAGES'
ViewUnknownValue =
'UNKNOWN'

Constants inherited from Service

Service::LinkOutFilterTask, Service::StandardTask

Instance Attribute Summary (collapse)

Attributes inherited from Service

#group, #name, #priority, #request, #service_id, #status, #task

Instance Method Summary (collapse)

Methods included from UmlautHttp

#http_fetch, #proxy_like_headers

Methods included from MetadataHelper

#get_doi, #get_epage, #get_gpo_item_nums, #get_identifier, #get_isbn, #get_issn, #get_lccn, #get_month, #get_oclcnum, #get_pmid, #get_search_creator, #get_search_terms, #get_search_title, #get_spage, #get_sudoc, #get_top_level_creator, #get_year, #normalize_lccn, #normalize_title, #raw_search_title, #title_is_serial?

Methods included from MarcHelper

#add_856_links, #edition_statement, #get_title, #get_years, #gmd_values, #service_type_for_856, #should_skip_856_link?, #strip_gmd

Methods inherited from Service

#credits, #handle_wrapper, #link_out_filter, #preempted_by, required_config_params, #translate

Constructor Details

- (GoogleBookSearch) initialize(config)



73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# File 'app/service_adaptors/google_book_search.rb', line 73

def initialize(config)    
  @url = 'https://www.googleapis.com/books/v1/volumes?q='
  
  @display_name = 'Google Books'
  
  # number of full views to show
  @num_full_views = 1
  
  # default on, to enhance our metadata with stuff from google
  @referent_enhance = true

  # default OFF, add description/abstract from GBS
  @abstract = false

  # Other responses on by default but can be turned off
  @cover_image   = true
  @fulltext      = true
  @search_inside = true
  @web_links     = true # to partial view :excerpts or :fulltext

  # google api key strongly recommended, otherwise you'll
  # probably get rate limited. 
  @api_key = nil
  
  @credits = {
    "Google Books" => "http://books.google.com/"
  }
  # While you can theoretically look up by LCCN on Google Books,
  # we have found FREQUENT false positives. There's no longer any
  # way to even report these to Google. By default, don't lookup
  # by LCCN. 
  @lookup_by_lccn = false
  
  super(config)
end

Instance Attribute Details

- (Object) display_name (readonly)

attr_reader is important for tests



55
56
57
# File 'app/service_adaptors/google_book_search.rb', line 55

def display_name
  @display_name
end

- (Object) num_full_views (readonly)

attr_reader is important for tests



55
56
57
# File 'app/service_adaptors/google_book_search.rb', line 55

def num_full_views
  @num_full_views
end

- (Object) url (readonly)

attr_reader is important for tests



55
56
57
# File 'app/service_adaptors/google_book_search.rb', line 55

def url
  @url
end

Instance Method Details

- (Object) add_abstract(request, data)



201
202
203
204
205
206
207
208
209
210
211
212
213
214
# File 'app/service_adaptors/google_book_search.rb', line 201

def add_abstract(request, data)
  info = data["items"].first.try {|h| h["volumeInfo"]}
  if description = info["description"]

    url = info["infoLink"]
    request.add_service_response(
        :service => self, 
        :display_text => "Description from Google Books", 
        :display_text_i18n => "description",
        :url => remove_query_context(url),
        :service_type_value =>  :abstract  
    )
  end
end

- (Object) add_cover_image(request, url)



442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
# File 'app/service_adaptors/google_book_search.rb', line 442

def add_cover_image(request, url)
  zoom_url = url.clone
  
  # if we're sent to a page other than the frontcover then strip out the
  # page number and insert front cover
  zoom_url.sub!(/&pg=.*?&/, '&printsec=frontcover&')
  
  # hack out the 'curl' if we can
  zoom_url.sub!('&edge=curl', '')
  
  request.add_service_response(
      :service=>self, 
      :display_text => 'Cover Image',
      :url => zoom_url, 
      :size => "medium",
      :service_type_value => :cover_image
  )     
end

- (Object) add_search_inside(request, data)



365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
# File 'app/service_adaptors/google_book_search.rb', line 365

def add_search_inside(request, data)
  # Just take the first one we find, if multiple
  searchable_view = find_entries(data, [ViewFullValue, ViewPartialValue])[0]        
  
  if ( searchable_view )
    url = searchable_view["volumeInfo"]["infoLink"]
    
    request.add_service_response( 
      :service => self,
      :display_text=>@display_name,
      :display_text_i18n => "display_name",
      :url=> remove_query_context(url),
      :service_type_value => :search_inside
     )                  
  end
  
end

- (Object) build_headers(request)

We don't need to fake a proxy request anymore, but we still include X-Forwarded-For so google can return location-appropriate availability. If there's an existing X-Forwarded-For, we respect it and add on to it.



291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
# File 'app/service_adaptors/google_book_search.rb', line 291

def build_headers(request)
  original_forwarded_for = nil
  if (request.http_env && request.http_env['HTTP_X_FORWARDED_FOR'])
    original_forwarded_for = request.http_env['HTTP_X_FORWARDED_FOR']                                  
  end

  # we used to prepare a comma seperated list in x-forwarded-for if
  # we had multiple requests, as per the x-forwarded-for spec, but I
  # think Google doesn't like it. 
  
  ip_address = (original_forwarded_for ?
      original_forwarded_for  :
      request.client_ip_addr.to_s)
  
  return {} if ip_address.blank?

  # If we've got a comma-seperated list from an X-Forwarded-For, we
  # can't send it on to google, google won't accept that, just take
  # the first one in the list, which is actually the ultimate client
  # IP. split returns the whole string if seperator isn't found, convenient.
  ip_address = ip_address.split(",").first
  
  # If all we have is an internal/private IP from the internal network,
  # do NOT send that to Google, or Google will give you a 503 error
  # and refuse to process your request, as of 7 sep 2011. sigh.
  # Also if it doesn't look like an IP at all, forget it, don't send it.     
  if ((! ip_address =~ /^\d+\.\d+\.\d+\/\d$/) || 
     ip_address.start_with?("10.") || 
     ip_address.start_with?("172.16") || 
     ip_address.start_with?("192.168"))
     return {}
  else    
    return {'X-Forwarded-For' => ip_address }
  end
end

- (Object) create_fulltext_service_response(request, data)

We only create a fulltext service response if we have a full view. We create only as many full views as are specified in config.



343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
# File 'app/service_adaptors/google_book_search.rb', line 343

def create_fulltext_service_response(request, data)
  full_views = find_entries(data, ViewFullValue)
  return nil if full_views.empty?
  
  count = 0
  full_views.each do |fv|
    
    uri = fv["volumeInfo"]["previewLink"]
        
    request.add_service_response(
        :service => self, 
        :display_text => @display_name, 
        :display_text_i18n => "display_name",
        :url => remove_query_context(uri),           
        :service_type_value =>  :fulltext  
    )
    count += 1
    break if count == @num_full_views
  end   
  return true
end

- (Object) do_query(bibkeys, request)



253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
# File 'app/service_adaptors/google_book_search.rb', line 253

def do_query(bibkeys, request)    
  headers = build_headers(request)
  link = @url + bibkeys
  if @api_key
    link += "&key=#{@api_key}"
  end
  
  # Add on limit to only request books, not magazines. 
  link += "&printType=books"

  Rails.logger.debug("GoogleBookSearch requesting: #{link}")        
  response = http_fetch(link, :headers => headers, :raise_on_http_error_code => false)        
  data = MultiJson.load(response.body)
  
  # If Google gives us an error cause it says it can't geo-locate, 
  # remove the IP, log warning, and try again. 
  
  if (data["error"] && data["error"]["errors"] &&
      data["error"]["errors"].find {|h| h["reason"] == "unknownLocation"} )
    Rails.logger.warn("GoogleBookSearch: geo-locate error, retrying without X-Forwarded-For: '#{link}' headers: #{headers.inspect} #{response.inspect}\n    #{data.inspect}")
    
    response = http_fetch(link, :raise_on_http_error_code => false)        
    data = MultiJson.load(response.body)
      
  end
  
  
  if (! response.kind_of?(Net::HTTPSuccess)) || data["error"]      
    Rails.logger.error("GoogleBookSearch error: '#{link}' headers: #{headers.inspect} #{response.inspect}\n    #{data.inspect}")
  end
      
  return data
end

create highlighted_link service response for partial and noview Only show one web link. prefer a partial view over a noview. Some noviews have a snippet/search, but we have no way to tell.



386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
# File 'app/service_adaptors/google_book_search.rb', line 386

def do_web_links(request, data)

  # some noview items will have a snippet view, but we have no way to tell
  info_views = find_entries(data, ViewPartialValue)
  viewability = ViewPartialValue
  
  if info_views.blank?
    info_views = find_entries(data, ViewNoneValue)
    viewability = ViewNoneValue  
  end
  
  # Shouldn't ever get to this point, but just in case
  return nil if info_views.blank?
  
  url = ''
  iv = info_views.first
  type = nil
  if (viewability == ViewPartialValue && 
      url = iv["volumeInfo"]["previewLink"])
    display_text = @display_name
    display_text_i18n = "display_name"
    type = ServiceTypeValue[:excerpts]
  else
    url = url = iv["volumeInfo"]["infoLink"]
    display_text = "Book Information"
    display_text_i18n = "book_information"
    type = ServiceTypeValue[:highlighted_link]
  end


  request.add_service_response( 
      :service=>self,    
      :url=> remove_query_context(url),
      :display_text=>display_text,
      :display_text_i18n => display_text_i18n,
      :service_type_value => type    
   )
end

- (Object) element_enhance(request, rft_key, value)

Will not over-write existing referent values.



217
218
219
220
221
# File 'app/service_adaptors/google_book_search.rb', line 217

def element_enhance(request, rft_key, value)
  if (value)
    request.referent.enhance_referent(rft_key, value.to_s, true, false, :overwrite => false)
  end
end

- (Object) enhance_referent(request, data)

Take the FIRST hit from google, and use it's values to enhance our metadata. Will NOT overwrite existing data.



156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
# File 'app/service_adaptors/google_book_search.rb', line 156

def enhance_referent(request, data)
  
  entry = data["items"].first
  

  if (volumeInfo = entry["volumeInfo"])
    
    title = volumeInfo["title"]
    title += ": #{volumeInfo["subtitle"]}" if (title && volumeInfo["subtitle"])
    
    element_enhance(request, "title", title)
    element_enhance(request, "au", volumeInfo["authors"].first) if volumeInfo["authors"]
    element_enhance(request, "pub", volumeInfo["publisher"])
    
    element_enhance(request, "tpages", volumeInfo["pageCount"])
    
    if (date = volumeInfo["publishedDate"]) && date =~ /^(\d\d\d\d)/
      element_enhance(request, "date", $1)
    end
    
    # LCCN is only rarely included, but is sometimes, eg:
    # "industryIdentifiers"=>[{"type"=>"OTHER", "identifier"=>"LCCN:72627172"}],          
    # Also "LCCN:76630875"
    #
    # And sometimes OCLC number like:
    # "industryIdentifiers"=>[{"type"=>"OTHER", "identifier"=>"OCLC:12345678"}],
    #        
    (volumeInfo["industryIdentifiers"] || []).each do |hash|
      
      if hash["type"] == "ISBN_13"
        element_enhance(request, "isbn", hash["identifier"])
        
      elsif hash["type"] == "OTHER" && hash["identifier"].starts_with?("LCCN:")
        lccn = normalize_lccn(  hash["identifier"].slice(5, hash["identifier"].length)  )
        request.referent.add_identifier("info:lccn/#{lccn}")
        
      elsif hash["type"] == "OTHER" && hash["identifier"].starts_with?("OCLC:")
        oclcnum = normalize_lccn(  hash["identifier"].slice(5, hash["identifier"].length)  )
        request.referent.add_identifier("info:oclcnum/#{oclcnum}")
      end
    
    end              
  end            
end

- (Object) find_entries(gbs_response, viewabilities)



327
328
329
330
331
332
333
334
335
336
337
338
# File 'app/service_adaptors/google_book_search.rb', line 327

def find_entries(gbs_response, viewabilities)
  unless (viewabilities.kind_of?(Array))
    viewabilities = [viewabilities]
  end

  entries = gbs_response["items"].find_all do |entry|
    viewability = entry["accessInfo"]["viewability"]
    (viewability && viewabilities.include?(viewability))           
  end

  return entries
end

- (Object) find_thumbnail_url(data)

Not all responses have a thumbnail_url. We look for them and return the 1st.



429
430
431
432
433
434
435
436
437
438
439
# File 'app/service_adaptors/google_book_search.rb', line 429

def find_thumbnail_url(data)
  entries = data["items"].collect do |entry|      
    entry["volumeInfo"]["imageLinks"]["thumbnail"] if entry["volumeInfo"] && entry["volumeInfo"]["imageLinks"]      
  end
  
  # removenill values
  entries.compact!    
  
  # pick the first of the available thumbnails, or nil
  return entries[0]
end

- (Object) get_bibkeys(rft)

returns nil or escaped string of bibkeys to increase the chances of good hit, we send all available bibkeys and later dedupe by id. FIXME Assumes we only have one of each kind of identifier.



228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
# File 'app/service_adaptors/google_book_search.rb', line 228

def get_bibkeys(rft)
  isbn = get_identifier(:urn, "isbn", rft)
  oclcnum = get_identifier(:info, "oclcnum", rft)
  lccn = get_lccn(rft)

  # Google doesn't officially support oclc/lccn search, but does
  # index as token with prefix smashed up right with identifier
  # eg http://books.google.com/books/feeds/volumes?q=OCLC32012617
  #
  # Except turns out doing it as a phrase search is important! Or
  # google's normalization/tokenization does odd things. 
  keys = []
  keys << ('isbn:' + isbn) if isbn
  keys << ('"' + "OCLC" + oclcnum + '"') if oclcnum
  # Only use LCCN if we've got nothing else, and we're allowing it. 
  # it returns many false positives. 
  if @lookup_by_lccn && lccn && keys.length == 0
    keys << ('"' + 'LCCN' + lccn + '"')
  end
  
  return nil if keys.empty?
  keys = CGI.escape( keys.join(' OR ') )
  return keys
end

- (Object) handle(request)



109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
# File 'app/service_adaptors/google_book_search.rb', line 109

def handle(request)

  bibkeys = get_bibkeys(request.referent)
  return request.dispatched(self, true) if bibkeys.nil?

  data = do_query(bibkeys, request)
  
  
  if data.blank? || data["error"]
    # fail fatal
    return request.dispatched(self, false)
  end
  
  # 0 hits, return. 
  return request.dispatched(self, true) if data["totalItems"] == 0
  
  enhance_referent(request, data) if @referent_enhance

  add_abstract(request, data) if @abstract
  
  #return full views first
  if @fulltext
    full_views_shown = create_fulltext_service_response(request, data)
  end
  
  if @search_inside
    # Add search_inside link if appropriate
    add_search_inside(request, data)
  end
  
  # only if no full view is shown, add links for partial view or noview
  unless full_views_shown
    do_web_links(request, data)
  end
  
  if @cover_image
    thumbnail_url = find_thumbnail_url(data)
    if thumbnail_url
      add_cover_image(request, thumbnail_url)    
    end
  end

  return request.dispatched(self, true)
end

- (Object) remove_query_context(url)

Google gives us URL to the book that contains a 'dq' param with the original query, which for us is an ISSN/LCCN/OCLCnum query, which we don't actually want to leave in there.



464
465
466
# File 'app/service_adaptors/google_book_search.rb', line 464

def remove_query_context(url)
  url.sub(/&dq=[^&]+/, '')    
end

- (Object) response_url(service_response, submitted_params)

Catch url_for call for search_inside, because we're going to redirect



469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
# File 'app/service_adaptors/google_book_search.rb', line 469

def response_url(service_response, )
  if ( ! (service_response.service_type_value.name == "search_inside" ))
    return super(service_response, )
  else
    # search inside!
    base = service_response[:url]
    query = CGI.escape(["query"] || "")
    # attempting to reverse engineer a bit to get 'snippet'
    # style results instead of 'onepage' style results. 
    # snippet seem more user friendly, and are what google's own
    # interface seems to give you by default. but 'onepage' is the
    # default from our deep link, but if we copy the JS hash data,
    # it looks like we can get Google to 'snippet'.       
    url = base + "&q=#{query}#v=snippet&q=#{query}&f=false"
    return url
  end
end

- (Object) service_types_generated



57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# File 'app/service_adaptors/google_book_search.rb', line 57

def service_types_generated
  types= []

  if @web_links
    types.push ServiceTypeValue[:highlighted_link]
    types.push ServiceTypeValue[:excerpts]
  end
  types.push(ServiceTypeValue[:search_inside]) if @search_inside
  types.push(ServiceTypeValue[:fulltext]) if @fulltext
  types.push(ServiceTypeValue[:cover_image]) if @cover_image
  types.push(ServiceTypeValue[:referent_enhance]) if @referent_enhance
  types.push(ServiceTypeValue[:abstract]) if @abstract

  return types
end