Module: BentoSearch::SearchEngine
- Extended by:
- ActiveSupport::Concern
- Includes:
- Capabilities
- Included in:
- EbscoHostEngine, EdsEngine, GoogleBooksEngine, GoogleSiteSearchEngine, JournalTocsForJournal, MockEngine, PrimoEngine, ScopusEngine, SummonEngine, WorldcatSruDcEngine, XerxesEngine
- Defined in:
- app/models/bento_search/search_engine.rb
Overview
Module mix-in for bento_search search engines.
Using a SearchEngine
See a whole bunch more examples in the project README.
You can initialize a search engine with configuration (some engines have required configuration):
engine = SomeSearchEngine.new(:config_key => 'foo')
Or, it can be convenient (and is required for some features) to store a search engine with configuration in a global registry:
BentoSearch.register_engine("some_searcher") do |config|
config.engine = "SomeSearchEngine"
config.config_key = "foo"
end
# instantiates a new engine with registered config:
engine = BentoSearch.get_engine("some_searcher")
You can then use the #search method, which returns an instance of
of BentoSearch::Results
results = engine.search("query")
See more docs under #search, as well as project README.
Standard configuration variables.
Some engines require their own engine-specific configuration for api keys and such, and offer their own engine-specific configuration for engine-specific features.
An additional semi-standard configuration variable, some engines take an ‘:auth => true` to tell the engine to assume that all access is by authenticated local users who should be given elevated access to results.
Additional standard configuration keys that are implemented by the bento_search framework:
[for_display.decorator]
String name of decorator class that will be applied by #bento_decorate
helper in standard view. See wiki for more info on decorators. Must be
string name, actual class object not supported (to make it easier
to serialize and transport configuration).
Implementing a SearchEngine
Implmeneting a new SearchEngine is relatively straightforward – you are generally only responsible for the parts specific to your search engine: receiving a query, making a call to the external search engine, and translating it’s result to standard a BentoSearch::Results full of BentoSearch::ResultItems.
Start out by simply including the search engine module:
class MyEngine
include BentoSearch::SearchEngine
end
Next, at a minimum, you need to implement a #search_implementation method, which takes a normalized hash of search instructions as input (see documentation at #normalized_search_arguments), and returns BentoSearch::Results item.
The Results object should have #total_items set with total hitcount, and contain BentoSearch::ResultItem objects for each hit in the current page. See individual class documentation for more info.
That’s about the extent of your responsibilities. If the search failed for some reason due to an error, you should return a Results object with it’s #error object set, so it will be ‘failed?`. The framework will take care of this for you for certain uncaught exceptions you allow to rise out of #search_implementation (timeouts, HTTPClient timeouts, nokogiri and MultiJson parse errors).
A SearchEngine object can be re-used for multiple searches, possibly under concurrent multi-threading. Do not store search-specific state in the search object. but you can store configuration-specific state there of course.
Recommend use of HTTPClient, if possible, for http searches. Especially using a class-level HTTPClient instance, to re-use persistent http connections accross searches (can be esp important if you need to contact external search api via https/ssl).
If you have required configuration keys, you can register that with class-level required_configuration_keys method.
You can also advertise max per-page value by overriding max_per_page.
If you support fielded searching, you should over-ride #search_field_definitions; if you support sorting, you should override #sort_definitions. See BentoSearch::SearchEngine::Capabilities module for documentation.
Defined Under Namespace
Modules: Capabilities, ClassMethods
Constant Summary collapse
- DefaultPerPage =
10
Instance Method Summary collapse
-
#fill_in_search_metadata_for(results, normalized_arguments = {}) ⇒ Object
SOME of the elements of Results to be returned that SearchEngine implementation fills in automatically post-search.
-
#initialize(aConfiguration = Confstruct::Configuration.new) ⇒ Object
If specific SearchEngine calls initialize, you want to call super handles configuration loading, mostly.
-
#normalized_search_arguments(*orig_arguments) ⇒ Object
(also: #parse_search_arguments)
Take the arguments passed into #search, which can be flexibly given in several ways, and normalize to an expected single hash that will be passed to an engine’s #search_implementation.
-
#public_settable_search_args ⇒ Object
Used mainly/only by the AJAX results loading.
-
#search(*arguments) ⇒ Object
Method used to actually get results from a search engine.
Methods included from Capabilities
#max_per_page, #search_field_definitions, #search_keys, #semantic_search_keys, #semantic_search_map, #sort_definitions, #sort_keys
Instance Method Details
#fill_in_search_metadata_for(results, normalized_arguments = {}) ⇒ Object
SOME of the elements of Results to be returned that SearchEngine implementation fills in automatically post-search. Extracted into a method for DRY in error handling to try to fill these in even in errors. Also can be used as public method for de-serialized or mock results.
252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 |
# File 'app/models/bento_search/search_engine.rb', line 252 def (results, normalized_arguments = {}) results.search_args = normalized_arguments results.start = normalized_arguments[:start] || 0 results.per_page = normalized_arguments[:per_page] results.engine_id = configuration.id results.display_configuration = configuration.for_display # We copy some configuraton info over to each Item, as a convenience # to display logic that may have decide what to do given only an item, # and may want to parameterize based on configuration. results.each do |item| item.engine_id = configuration.id item.decorator = configuration.lookup!("for_display.decorator") item.display_configuration = configuration.for_display end results end |
#initialize(aConfiguration = Confstruct::Configuration.new) ⇒ Object
If specific SearchEngine calls initialize, you want to call super handles configuration loading, mostly. Argument is a Confstruct::Configuration or Hash.
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
# File 'app/models/bento_search/search_engine.rb', line 136 def initialize(aConfiguration = Confstruct::Configuration.new) # To work around weird confstruct bug, we need to change # a hash to a Confstruct ourselves. # https://github.com/mbklein/confstruct/issues/14 unless aConfiguration.kind_of? Confstruct::Configuration aConfiguration = Confstruct::Configuration.new aConfiguration end # init, from copy of default, or new if self.class.default_configuration self.configuration = Confstruct::Configuration.new(self.class.default_configuration) else self.configuration = Confstruct::Configuration.new end # merge in current instance config self.configuration.configure ( aConfiguration ) # global defaults? self.configuration[:for_display] ||= {} # check for required keys -- have to be present, and not nil if self.class.required_configuration self.class.required_configuration.each do |required_key| if ["**NOT_FOUND**", nil].include? self.configuration.lookup!(required_key.to_s, "**NOT_FOUND**") raise ArgumentError.new("#{self.class.name} requires configuration key #{required_key}") end end end end |
#normalized_search_arguments(*orig_arguments) ⇒ Object Also known as: parse_search_arguments
Take the arguments passed into #search, which can be flexibly given in several ways, and normalize to an expected single hash that will be passed to an engine’s #search_implementation. The output of this method is a single hash, and is what a #search_implementation can expect to receive as an argument, with keys:
- :query
-
the query
- :per_page
-
will always be present, using the default per_page if none given by caller
- :start, :page
-
both :start and :page will always be present, regardless of which the caller used. They will both be integers, even if strings passed in.
- :search_field
-
A search field from the engine’s #search_field_definitions, as string.
Even if the caller used :semantic_search_field, it’ll be normalized to the actual local search_field key on output.
- :sort
-
Sort key.
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 |
# File 'app/models/bento_search/search_engine.rb', line 289 def normalized_search_arguments(*orig_arguments) arguments = {} # Two-arg style to one hash, if present if (orig_arguments.length > 1 || (orig_arguments.length == 1 && ! orig_arguments.first.kind_of?(Hash))) arguments[:query] = orig_arguments.delete_at(0) end arguments.merge!(orig_arguments.first) if orig_arguments.length > 0 # allow strings for pagination (like from url query), change to # int please. [:page, :per_page, :start].each do |key| arguments.delete(key) if arguments[key].blank? arguments[key] = arguments[key].to_i if arguments[key] end arguments[:per_page] ||= DefaultPerPage # illegal arguments if (arguments[:start] && arguments[:page]) raise ArgumentError.new("Can't supply both :page and :start") end if ( arguments[:per_page] && self.max_per_page && arguments[:per_page] > self.max_per_page) raise ArgumentError.new("#{arguments[:per_page]} is more than maximum :per_page of #{self.max_per_page} for #{self.class}") end # Normalize :page to :start, and vice versa if arguments[:page] arguments[:start] = (arguments[:page] - 1) * arguments[:per_page] elsif arguments[:start] arguments[:page] = (arguments[:start] / arguments[:per_page]) + 1 end # normalize :sort from possibly symbol to string # TODO: raise if unrecognized sort key? if arguments[:sort] arguments[:sort] = arguments[:sort].to_s end # translate semantic_search_field to search_field, or raise if # can't. if (semantic = arguments.delete(:semantic_search_field)) && ! semantic.blank? mapped = self.semantic_search_map[semantic.to_s] if config_arg(arguments, :unrecognized_search_field) == "raise" && ! mapped raise ArgumentError.new("#{self.class.name} does not know about :semantic_search_field #{semantic}") end arguments[:search_field] = mapped end if config_arg(arguments, :unrecognized_search_field) == "raise" && ! search_keys.include?(arguments[:search_field]) raise ArgumentError.new("#{self.class.name} does not know about :search_field #{arguments[:search_field]}") end return arguments end |
#public_settable_search_args ⇒ Object
Used mainly/only by the AJAX results loading. an array WHITELIST of attributes that can be sent as non-verified request params and used to execute a search. For instance, ‘auth’ is NOT on there, you can’t trust a web request as to ‘auth’ status. individual engines may over-ride, call super, and add additional engine-specific attributes.
358 359 360 |
# File 'app/models/bento_search/search_engine.rb', line 358 def public_settable_search_args [:query, :search_field, :semantic_search_field, :sort, :page, :start, :per_page] end |
#search(*arguments) ⇒ Object
Method used to actually get results from a search engine.
When implementing a search engine, you do not override this #search method, but instead override #search_implementation. #search will call your specific #search_implementation, first normalizing the query arguments, and then normalizing and adding standard metadata to your return value.
Most engines support pagination, sorting, and searching in a specific
field.
# 1-based page index
engine.search("query", :per_page => 20, :page => 5)
# or use 0-based per-record index, engines that don't
# support this will round to nearest page.
engine.search("query", :start => 20)
You can ask an engine what search fields it supports with engine.search_keys
engine.search("query", :search_field => "engine_search_field_name")
There are also normalized 'semantic' names you can use accross engines
(if they support them): :title, :author, :subject, maybe more.
engine.search("query", :semantic_search_field => :title)
Ask an engine what semantic field names it supports with `engine.semantic_search_keys`
Unrecognized search fields will be ignored, unless you pass in
:unrecognized_search_field => :raise (or do same in config).
Ask an engine what sort fields it supports with `engine.sort_keys`. See
list of standard sort keys in I18n file at ./config/locales/en.yml, in
`en.bento_search.sort_keys`.
engine.search("query", :sort => "some_sort_key")
Some engines support additional arguments to 'search', see individual
engine documentation. For instance, some engines support `:auth => true`
to give the user elevated search privileges when you have an authenticated
local user.
Query as first arg is just a convenience, you can also use a single hash argument.
engine.search(:query => "query", :per_page => 20, :page => 4)
214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 |
# File 'app/models/bento_search/search_engine.rb', line 214 def search(*arguments) start_t = Time.now arguments = normalized_search_arguments(*arguments) results = search_implementation(arguments) (results, arguments) results.timing = (Time.now - start_t) return results rescue *auto_rescue_exceptions => e # Uncaught exception, log and turn into failed Results object. We # only catch certain types of exceptions, or it makes dev really # confusing eating exceptions. This is intentionally a convenience # to allow search engine implementations to just raise the exception # and we'll turn it into a proper error. cleaned_backtrace = Rails.backtrace_cleaner.clean(e.backtrace) log_msg = "BentoSearch::SearchEngine failed results: #{e.inspect}\n #{cleaned_backtrace.join("\n ")}" Rails.logger.error log_msg failed = BentoSearch::Results.new failed.error ||= {} failed.error[:exception] = e failed.timing = (Time.now - start_t) (failed, arguments) return failed end |