Class: Picky::Search

Inherits:
Object show all
Includes:
API::Search::Boost, Helpers::Measuring
Defined in:
lib/picky/search.rb,
lib/picky/search_facets.rb

Overview

Picky Searches

A Picky Search is an object which:

  • holds one or more indexes

  • offers an interface to query these indexes.

Example:

search = Picky::Search.new index1, index2
search.search 'query'

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Helpers::Measuring

#timed

Methods included from API::Search::Boost

#extract_boosts

Constructor Details

#initialize(*indexes) ⇒ Search

Takes:

  • A number of indexes

It is also possible to define the tokenizer and boosts like so. Example:

search = Search.new(index1, index2, index3) do
  searching removes_characters: /[^a-z]/ # etc.
  boosts [:author, :title] => +3,
         [:title, :isbn] => +1
end


40
41
42
43
44
45
46
47
48
49
# File 'lib/picky/search.rb', line 40

def initialize *indexes
  @indexes = Query::Indexes.new *indexes

  instance_eval(&Proc.new) if block_given?

  @tokenizer ||= Tokenizer.searching # THINK Not dynamic. Ok?
  @boosts    ||= Query::Boosts.new

  self
end

Instance Attribute Details

#boostsObject

Returns the value of attribute boosts.



23
24
25
# File 'lib/picky/search.rb', line 23

def boosts
  @boosts
end

#ignore_unassignedObject (readonly)

Returns the value of attribute ignore_unassigned.



21
22
23
# File 'lib/picky/search.rb', line 21

def ignore_unassigned
  @ignore_unassigned
end

#indexesObject (readonly)

Returns the value of attribute indexes.



21
22
23
# File 'lib/picky/search.rb', line 21

def indexes
  @indexes
end

#tokenizerObject

Returns the value of attribute tokenizer.



23
24
25
# File 'lib/picky/search.rb', line 23

def tokenizer
  @tokenizer
end

Instance Method Details

#boost(boosts) ⇒ Object

Examples:

search = Search.new(books_index, dvd_index, mp3_index) do
  boost [:author, :title] => +3,
        [:title, :isbn]   => +1
end

or

# Explicitly add a random number (0...1) to the boosts.
#
my_boosts = Class.new do
  # Instance only needs to implement
  #   boost_for combinations
  # and return a number that is
  # added to the score.
  #
  def boost_for combinations
    rand
  end
end.new

search = Search.new(books_index, dvd_index, mp3_index) do
  boost my_boosts
end


138
139
140
# File 'lib/picky/search.rb', line 138

def boost boosts
  @boosts = extract_boosts boosts
end

#execute(tokens, ids, offset, original_text = nil, unique = false) ⇒ Object

Execute a search using Query::Tokens.

Note: Internal method, use #search to search.



242
243
244
245
246
247
248
249
# File 'lib/picky/search.rb', line 242

def execute tokens, ids, offset, original_text = nil, unique = false
  Results.new original_text,
              ids,
              offset,
              sorted_allocations(tokens, @max_allocations),
              @extra_allocations,
              unique
end

#facets(category_identifier, options = {}) ⇒ Object

Returns a list/hash of filtered facets.

Params

category: The category whose facets to return.

Options

counts: Whether you want counts (returns a Hash) or not (returns an Array). (Default true)
at_least: A minimum count a facet needs to have (inclusive). (Default 1)
filter: A query to filter the facets with(no_args).

Usage:

search.facets :name, filter: 'surname:peter', at_least: 2


18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# File 'lib/picky/search_facets.rb', line 18

def facets category_identifier, options = {}
  # TODO Make it work. How should it work with multiple indexes?
  #
  raise "#{__method__} cannot be used on searches with more than 1 index yet. Sorry!" if indexes.size > 1
  index = indexes.first
  
  # Get index-specific facet counts.
  #
  counts = index.facets category_identifier, options
  
  # We're done if there is no filter.
  #
  return counts unless filter_query = options[:filter]
  
  # Pre-tokenize query token category.
  #
  predefined_categories = [index[category_identifier]]
  
  # Pre-tokenize key token – replace text below.
  # Note: The original is not important.
  #
  # TODO Don't use predefined. Perhaps do:
  # key_token = Query::Token.new ''
  # key_token.predefined_categories = [index[category_identifier]]
  #
  empty = @symbol_keys ? :'' : ''
  key_token = Query::Token.new empty, nil, predefined_categories
  
  # Pre-tokenize filter for reuse.
  #
  tokenized_filter_query = tokenized filter_query, false
  tokenized_filter_query.tokens.push key_token
  
  # Extract options.
  #
  no_counts = options[:counts] == false
  minimal_counts = options[:at_least] || 1 # Default needs at least one.
  
  # Get actual counts.
  #
  if no_counts
    facets_without_counts counts, minimal_counts, tokenized_filter_query, options do |last_text|
      key_token.text = last_text # TODO Why is this necessary?
    end
  else
    facets_with_counts counts, minimal_counts, tokenized_filter_query, key_token.text, options do |last_text|
      key_token.text = last_text # TODO Why is this necessary?
    end
  end
end

#facets_with_counts(counts, minimal_counts, tokenized_filter_query, last_token_text, options = {}) ⇒ Object



87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
# File 'lib/picky/search_facets.rb', line 87

def facets_with_counts counts, minimal_counts, tokenized_filter_query, last_token_text, options = {}
  counts.inject({}) do |result, (key, _)|
    # Replace only the key token text because that
    # is the only information that changes in between
    # queries.
    #
    yield key
    
    # Calculate up to 1000 facets using unique to show correct facet counts.
    # TODO Redesign and deoptimize the whole process.
    #
    total = search_with(tokenized_filter_query, 1000, 0, nil, true).total
    
    next result unless total >= minimal_counts
    result[key] = total
    result
  end
end

#facets_without_counts(counts, minimal_counts, tokenized_filter_query, last_token_text, options = {}) ⇒ Object



68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# File 'lib/picky/search_facets.rb', line 68

def facets_without_counts counts, minimal_counts, tokenized_filter_query, last_token_text, options = {}
  counts.inject([]) do |result, (key, _)|
    # Replace only the key token text because that
    # is the only information that changes in between
    # queries.
    #
    # Note: DOes not use replace anymore.
    #
    yield key
    
    # Calculate up to 1000 facets using unique to show correct facet counts.
    # TODO Redesign and deoptimize the whole process.
    #
    total = search_with(tokenized_filter_query, 1000, 0, nil, true).total
    
    next result unless total >= minimal_counts
    result << key
  end
end

#ignore(*allocations_and_categories) ⇒ Object

Ignore given categories and/or combinations of categories.

Example:

search = Search.new(people) do
  ignore :name,
         :first_name
         [:last_name, :street]
end


156
157
158
159
160
161
162
163
164
# File 'lib/picky/search.rb', line 156

def ignore *allocations_and_categories
  allocations_and_categories.each do |allocation_or_category|
    if allocation_or_category.respond_to? :to_sym
      indexes.ignore_categories allocation_or_category
    else
      indexes.ignore_allocations allocation_or_category
    end
  end
end

#ignore_unassigned_tokens(value = true) ⇒ Object

Ignore the given token if it cannot be matched to a category. The default behaviour is that if a token does not match to any category, the query will not return anything (since a single token cannot be matched). If you set this option to true, any token that cannot be matched to a category will be simply ignored.

Use this if only a few matched words are important, like for example of the query “Jonathan Myers 86455 Las Cucarachas” you only want to match the zipcode, to have the search engine display advertisements on the side for the zipcode.

False by default.

Example:

search = Search.new(books_index, dvd_index, mp3_index) do
  ignore_unassigned_tokens
end

With this set (to true), if in “Peter Flunder”, “Flunder” couldn’t be assigned to any category, it will simply be ignored. This is done for each categorization.



202
203
204
# File 'lib/picky/search.rb', line 202

def ignore_unassigned_tokens value = true
  @ignore_unassigned = value
end

#max_allocations(amount = nil) ⇒ Object

Sets the max amount of allocations to calculate.

Examples:

search = Search.new(index1, index2, index3) do
  max_allocations 10
end


78
79
80
# File 'lib/picky/search.rb', line 78

def max_allocations amount = nil
  amount ? @max_allocations = amount : @max_allocations
end

#only(*allocations_and_categories) ⇒ Object

Exclusively keep combinations of categories.

Example:

search = Search.new(people) do
  only [:last_name, :street],
       [:last_name, :first_name]
end


175
176
177
# File 'lib/picky/search.rb', line 175

def only *allocations_and_categories
  indexes.keep_allocations *allocations_and_categories
end

#search(text, ids = 20, offset = 0, options = {}) ⇒ Object

This is the main entry point for a query. Use this in specs and also for running queries.

Parameters:

  • text: The search text.

  • ids = 20: The amount of ids to calculate (with offset).

  • offset = 0: The offset from which position to return the ids. Useful for pagination.

Options:

  • unique: Whether to return unique ids.

Note: The Rack adapter calls this method after unravelling the HTTP request.



219
220
221
# File 'lib/picky/search.rb', line 219

def search text, ids = 20, offset = 0, options = {}
  search_with tokenized(text), ids.to_i, offset.to_i, text, options[:unique]
end

#search_with(tokens, ids = 20, offset = 0, original_text = nil, unique = false) ⇒ Object

Runs the actual search using Query::Tokens.

Note: Internal method, use #search to search.



227
228
229
230
231
232
233
234
235
236
# File 'lib/picky/search.rb', line 227

def search_with tokens, ids = 20, offset = 0, original_text = nil, unique = false
  results = nil

  duration = timed do
    results = execute tokens, ids, offset, original_text, unique
  end
  results.duration = duration.round 6

  results
end

#searching(options) ⇒ Object

Defines tokenizer options or the tokenizer itself.

Examples:

search = Search.new(index1, index2, index3) do
  searching removes_characters: /[^a-z]/,
            # etc.
end

search = Search.new(index1, index2, index3) do
  searching MyTokenizerThatRespondsToTheMethodTokenize.new
end


63
64
65
66
67
68
69
# File 'lib/picky/search.rb', line 63

def searching options
  @tokenizer = if options.respond_to? :tokenize
    options
  else
    options && Tokenizer.new(options)
  end
end

#sorted_allocations(tokens, amount = nil) ⇒ Object

Gets sorted allocations for the tokens.

TODO Remove and just call prepared (and rename to sorted)?



274
275
276
# File 'lib/picky/search.rb', line 274

def sorted_allocations tokens, amount = nil
  indexes.prepared_allocations_for tokens, boosts, amount
end

#symbol_keysObject



142
143
144
# File 'lib/picky/search.rb', line 142

def symbol_keys
  @symbol_keys = true
end

#terminate_early(extra_allocations = 0) ⇒ Object

Tells Picky to terminate calculating ids if it has enough ids. (So, early)

Important note: Do not use this for the live search! (As Picky needs to calculate the total)

Note: When using the Picky interface, do not terminate too early as this will kill off the allocation selections. A value of

terminate_early 5

is probably a good idea to show the user 5 extra beyond the needed ones.

Examples:

# Terminate if you have enough ids.
#
search = Search.new(index1, index2, index3) do
  terminate_early
end

# After calculating enough ids,
# calculate 5 extra allocations for the interface.
#
search = Search.new(index1, index2, index3) do
  terminate_early 5
end


109
110
111
# File 'lib/picky/search.rb', line 109

def terminate_early extra_allocations = 0
  @extra_allocations = extra_allocations.respond_to?(:to_hash) ? extra_allocations[:with_extra_allocations] : extra_allocations
end

#to_sObject

Display some nice information for the user.



280
281
282
283
284
285
286
# File 'lib/picky/search.rb', line 280

def to_s
  s = [
    (@indexes.indexes.map(&:name).join(', ') unless @indexes.indexes.empty?),
    ("boosts: #@boosts" if @boosts)
  ].compact
  "#{self.class}(#{s.join(', ')})"
end

#tokenized(text, partialize_last = true) ⇒ Object

Forwards the tokenizing to the query tokenizer.

Parameters:

  • text: The string to tokenize.

  • partialize_last: Whether to partialize the last token.

Note: By default, the last token is always partial.

Returns:

  • A Picky::Query::Tokens instance.



262
263
264
265
266
267
268
# File 'lib/picky/search.rb', line 262

def tokenized text, partialize_last = true
  tokens, originals = tokenizer.tokenize text
  tokens = Query::Tokens.processed tokens, originals || tokens, @ignore_unassigned
  tokens.symbolize if @symbol_keys # SYMBOLS.
  tokens.partialize_last if partialize_last
  tokens
end