Class: Picky::Search
- Includes:
- API::Search::Boost, Helpers::Measuring
- Defined in:
- lib/picky/search.rb,
lib/picky/search_facets.rb
Overview
Picky Searches
A Picky Search is an object which:
-
holds one or more indexes
-
offers an interface to query these indexes.
Example:
search = Picky::Search.new index1, index2
search.search 'query'
Instance Attribute Summary collapse
-
#boosts ⇒ Object
Returns the value of attribute boosts.
-
#ignore_unassigned ⇒ Object
readonly
Returns the value of attribute ignore_unassigned.
-
#indexes ⇒ Object
readonly
Returns the value of attribute indexes.
-
#tokenizer ⇒ Object
Returns the value of attribute tokenizer.
Instance Method Summary collapse
-
#boost(boosts) ⇒ Object
Examples: search = Search.new(books_index, dvd_index, mp3_index) do boost [:author, :title] => 3, [:title, :isbn] => 1 end.
-
#execute(tokens, ids, offset, original_text = nil, unique = false) ⇒ Object
Execute a search using Query::Tokens.
-
#facets(category_identifier, options = {}) ⇒ Object
Returns a list/hash of filtered facets.
- #facets_with_counts(counts, minimal_counts, tokenized_filter_query, last_token_text, options = {}) ⇒ Object
- #facets_without_counts(counts, minimal_counts, tokenized_filter_query, last_token_text, options = {}) ⇒ Object
-
#ignore(*allocations_and_categories) ⇒ Object
Ignore given categories and/or combinations of categories.
-
#ignore_unassigned_tokens(value = true) ⇒ Object
Ignore the given token if it cannot be matched to a category.
-
#initialize(*indexes) ⇒ Search
constructor
Takes: * A number of indexes.
-
#max_allocations(amount = nil) ⇒ Object
Sets the max amount of allocations to calculate.
-
#only(*allocations_and_categories) ⇒ Object
Exclusively keep combinations of categories.
-
#search(text, ids = 20, offset = 0, options = {}) ⇒ Object
This is the main entry point for a query.
-
#search_with(tokens, ids = 20, offset = 0, original_text = nil, unique = false) ⇒ Object
Runs the actual search using Query::Tokens.
-
#searching(options) ⇒ Object
Defines tokenizer options or the tokenizer itself.
-
#sorted_allocations(tokens, amount = nil) ⇒ Object
Gets sorted allocations for the tokens.
- #symbol_keys ⇒ Object
-
#terminate_early(extra_allocations = 0) ⇒ Object
Tells Picky to terminate calculating ids if it has enough ids.
-
#to_s ⇒ Object
Display some nice information for the user.
-
#tokenized(text, partialize_last = true) ⇒ Object
Forwards the tokenizing to the query tokenizer.
Methods included from Helpers::Measuring
Methods included from API::Search::Boost
Constructor Details
#initialize(*indexes) ⇒ Search
Takes:
-
A number of indexes
It is also possible to define the tokenizer and boosts like so. Example:
search = Search.new(index1, index2, index3) do
searching removes_characters: /[^a-z]/ # etc.
boosts [:author, :title] => +3,
[:title, :isbn] => +1
end
40 41 42 43 44 45 46 47 48 49 |
# File 'lib/picky/search.rb', line 40 def initialize *indexes @indexes = Query::Indexes.new *indexes instance_eval(&Proc.new) if block_given? @tokenizer ||= Tokenizer.searching # THINK Not dynamic. Ok? @boosts ||= Query::Boosts.new self end |
Instance Attribute Details
#boosts ⇒ Object
Returns the value of attribute boosts.
23 24 25 |
# File 'lib/picky/search.rb', line 23 def boosts @boosts end |
#ignore_unassigned ⇒ Object (readonly)
Returns the value of attribute ignore_unassigned.
21 22 23 |
# File 'lib/picky/search.rb', line 21 def ignore_unassigned @ignore_unassigned end |
#indexes ⇒ Object (readonly)
Returns the value of attribute indexes.
21 22 23 |
# File 'lib/picky/search.rb', line 21 def indexes @indexes end |
#tokenizer ⇒ Object
Returns the value of attribute tokenizer.
23 24 25 |
# File 'lib/picky/search.rb', line 23 def tokenizer @tokenizer end |
Instance Method Details
#boost(boosts) ⇒ Object
Examples:
search = Search.new(books_index, dvd_index, mp3_index) do
boost [:author, :title] => +3,
[:title, :isbn] => +1
end
or
# Explicitly add a random number (0...1) to the boosts.
#
my_boosts = Class.new do
# Instance only needs to implement
# boost_for combinations
# and return a number that is
# added to the score.
#
def boost_for combinations
rand
end
end.new
search = Search.new(books_index, dvd_index, mp3_index) do
boost my_boosts
end
138 139 140 |
# File 'lib/picky/search.rb', line 138 def boost boosts @boosts = extract_boosts boosts end |
#execute(tokens, ids, offset, original_text = nil, unique = false) ⇒ Object
Execute a search using Query::Tokens.
Note: Internal method, use #search to search.
242 243 244 245 246 247 248 249 |
# File 'lib/picky/search.rb', line 242 def execute tokens, ids, offset, original_text = nil, unique = false Results.new original_text, ids, offset, sorted_allocations(tokens, @max_allocations), @extra_allocations, unique end |
#facets(category_identifier, options = {}) ⇒ Object
Returns a list/hash of filtered facets.
Params
category: The category whose facets to return.
Options
counts: Whether you want counts (returns a Hash) or not (returns an Array). (Default true)
at_least: A minimum count a facet needs to have (inclusive). (Default 1)
filter: A query to filter the facets with(no_args).
Usage:
search.facets :name, filter: 'surname:peter', at_least: 2
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
# File 'lib/picky/search_facets.rb', line 18 def facets category_identifier, = {} # TODO Make it work. How should it work with multiple indexes? # raise "#{__method__} cannot be used on searches with more than 1 index yet. Sorry!" if indexes.size > 1 index = indexes.first # Get index-specific facet counts. # counts = index.facets category_identifier, # We're done if there is no filter. # return counts unless filter_query = [:filter] # Pre-tokenize query token category. # predefined_categories = [index[category_identifier]] # Pre-tokenize key token – replace text below. # Note: The original is not important. # # TODO Don't use predefined. Perhaps do: # key_token = Query::Token.new '' # key_token.predefined_categories = [index[category_identifier]] # empty = @symbol_keys ? :'' : '' key_token = Query::Token.new empty, nil, predefined_categories # Pre-tokenize filter for reuse. # tokenized_filter_query = tokenized filter_query, false tokenized_filter_query.tokens.push key_token # Extract options. # no_counts = [:counts] == false minimal_counts = [:at_least] || 1 # Default needs at least one. # Get actual counts. # if no_counts facets_without_counts counts, minimal_counts, tokenized_filter_query, do |last_text| key_token.text = last_text # TODO Why is this necessary? end else facets_with_counts counts, minimal_counts, tokenized_filter_query, key_token.text, do |last_text| key_token.text = last_text # TODO Why is this necessary? end end end |
#facets_with_counts(counts, minimal_counts, tokenized_filter_query, last_token_text, options = {}) ⇒ Object
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
# File 'lib/picky/search_facets.rb', line 87 def facets_with_counts counts, minimal_counts, tokenized_filter_query, last_token_text, = {} counts.inject({}) do |result, (key, _)| # Replace only the key token text because that # is the only information that changes in between # queries. # yield key # Calculate up to 1000 facets using unique to show correct facet counts. # TODO Redesign and deoptimize the whole process. # total = search_with(tokenized_filter_query, 1000, 0, nil, true).total next result unless total >= minimal_counts result[key] = total result end end |
#facets_without_counts(counts, minimal_counts, tokenized_filter_query, last_token_text, options = {}) ⇒ Object
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
# File 'lib/picky/search_facets.rb', line 68 def facets_without_counts counts, minimal_counts, tokenized_filter_query, last_token_text, = {} counts.inject([]) do |result, (key, _)| # Replace only the key token text because that # is the only information that changes in between # queries. # # Note: DOes not use replace anymore. # yield key # Calculate up to 1000 facets using unique to show correct facet counts. # TODO Redesign and deoptimize the whole process. # total = search_with(tokenized_filter_query, 1000, 0, nil, true).total next result unless total >= minimal_counts result << key end end |
#ignore(*allocations_and_categories) ⇒ Object
Ignore given categories and/or combinations of categories.
Example:
search = Search.new(people) do
ignore :name,
:first_name
[:last_name, :street]
end
156 157 158 159 160 161 162 163 164 |
# File 'lib/picky/search.rb', line 156 def ignore *allocations_and_categories allocations_and_categories.each do |allocation_or_category| if allocation_or_category.respond_to? :to_sym indexes.ignore_categories allocation_or_category else indexes.ignore_allocations allocation_or_category end end end |
#ignore_unassigned_tokens(value = true) ⇒ Object
Ignore the given token if it cannot be matched to a category. The default behaviour is that if a token does not match to any category, the query will not return anything (since a single token cannot be matched). If you set this option to true, any token that cannot be matched to a category will be simply ignored.
Use this if only a few matched words are important, like for example of the query “Jonathan Myers 86455 Las Cucarachas” you only want to match the zipcode, to have the search engine display advertisements on the side for the zipcode.
False by default.
Example:
search = Search.new(books_index, dvd_index, mp3_index) do
ignore_unassigned_tokens
end
With this set (to true), if in “Peter Flunder”, “Flunder” couldn’t be assigned to any category, it will simply be ignored. This is done for each categorization.
202 203 204 |
# File 'lib/picky/search.rb', line 202 def ignore_unassigned_tokens value = true @ignore_unassigned = value end |
#max_allocations(amount = nil) ⇒ Object
Sets the max amount of allocations to calculate.
Examples:
search = Search.new(index1, index2, index3) do
max_allocations 10
end
78 79 80 |
# File 'lib/picky/search.rb', line 78 def max_allocations amount = nil amount ? @max_allocations = amount : @max_allocations end |
#only(*allocations_and_categories) ⇒ Object
Exclusively keep combinations of categories.
Example:
search = Search.new(people) do
only [:last_name, :street],
[:last_name, :first_name]
end
175 176 177 |
# File 'lib/picky/search.rb', line 175 def only *allocations_and_categories indexes.keep_allocations *allocations_and_categories end |
#search(text, ids = 20, offset = 0, options = {}) ⇒ Object
This is the main entry point for a query. Use this in specs and also for running queries.
Parameters:
-
text: The search text.
-
ids = 20: The amount of ids to calculate (with offset).
-
offset = 0: The offset from which position to return the ids. Useful for pagination.
Options:
-
unique: Whether to return unique ids.
Note: The Rack adapter calls this method after unravelling the HTTP request.
219 220 221 |
# File 'lib/picky/search.rb', line 219 def search text, ids = 20, offset = 0, = {} search_with tokenized(text), ids.to_i, offset.to_i, text, [:unique] end |
#search_with(tokens, ids = 20, offset = 0, original_text = nil, unique = false) ⇒ Object
Runs the actual search using Query::Tokens.
Note: Internal method, use #search to search.
227 228 229 230 231 232 233 234 235 236 |
# File 'lib/picky/search.rb', line 227 def search_with tokens, ids = 20, offset = 0, original_text = nil, unique = false results = nil duration = timed do results = execute tokens, ids, offset, original_text, unique end results.duration = duration.round 6 results end |
#searching(options) ⇒ Object
Defines tokenizer options or the tokenizer itself.
Examples:
search = Search.new(index1, index2, index3) do
searching removes_characters: /[^a-z]/,
# etc.
end
search = Search.new(index1, index2, index3) do
searching MyTokenizerThatRespondsToTheMethodTokenize.new
end
63 64 65 66 67 68 69 |
# File 'lib/picky/search.rb', line 63 def searching @tokenizer = if .respond_to? :tokenize else && Tokenizer.new() end end |
#sorted_allocations(tokens, amount = nil) ⇒ Object
Gets sorted allocations for the tokens.
TODO Remove and just call prepared (and rename to sorted)?
274 275 276 |
# File 'lib/picky/search.rb', line 274 def sorted_allocations tokens, amount = nil indexes.prepared_allocations_for tokens, boosts, amount end |
#symbol_keys ⇒ Object
142 143 144 |
# File 'lib/picky/search.rb', line 142 def symbol_keys @symbol_keys = true end |
#terminate_early(extra_allocations = 0) ⇒ Object
Tells Picky to terminate calculating ids if it has enough ids. (So, early)
Important note: Do not use this for the live search! (As Picky needs to calculate the total)
Note: When using the Picky interface, do not terminate too early as this will kill off the allocation selections. A value of
terminate_early 5
is probably a good idea to show the user 5 extra beyond the needed ones.
Examples:
# Terminate if you have enough ids.
#
search = Search.new(index1, index2, index3) do
terminate_early
end
# After calculating enough ids,
# calculate 5 extra allocations for the interface.
#
search = Search.new(index1, index2, index3) do
terminate_early 5
end
109 110 111 |
# File 'lib/picky/search.rb', line 109 def terminate_early extra_allocations = 0 @extra_allocations = extra_allocations.respond_to?(:to_hash) ? extra_allocations[:with_extra_allocations] : extra_allocations end |
#to_s ⇒ Object
Display some nice information for the user.
280 281 282 283 284 285 286 |
# File 'lib/picky/search.rb', line 280 def to_s s = [ (@indexes.indexes.map(&:name).join(', ') unless @indexes.indexes.empty?), ("boosts: #@boosts" if @boosts) ].compact "#{self.class}(#{s.join(', ')})" end |
#tokenized(text, partialize_last = true) ⇒ Object
Forwards the tokenizing to the query tokenizer.
Parameters:
-
text: The string to tokenize.
-
partialize_last: Whether to partialize the last token.
Note: By default, the last token is always partial.
Returns:
-
A Picky::Query::Tokens instance.
262 263 264 265 266 267 268 |
# File 'lib/picky/search.rb', line 262 def tokenized text, partialize_last = true tokens, originals = tokenizer.tokenize text tokens = Query::Tokens.processed tokens, originals || tokens, @ignore_unassigned tokens.symbolize if @symbol_keys # SYMBOLS. tokens.partialize_last if partialize_last tokens end |