Class: Picky::Index
- Includes:
- Helpers::Indexing
- Defined in:
- lib/picky/index.rb,
lib/picky/index/hints.rb,
lib/picky/index_facets.rb,
lib/picky/index_indexed.rb,
lib/picky/index_indexing.rb,
lib/picky/index_realtime.rb,
lib/picky/index_convenience.rb
Defined Under Namespace
Classes: Hints
Instance Attribute Summary collapse
-
#categories ⇒ Object
readonly
Returns the value of attribute categories.
-
#hints ⇒ Object
readonly
Returns the value of attribute hints.
-
#name ⇒ Object
readonly
Returns the value of attribute name.
Instance Method Summary collapse
-
#<<(thing) ⇒ Object
Add at the end.
-
#add(thing, method: :unshift, force_update: false) ⇒ Object
Add to the index using unshift.
-
#after_indexing(after_indexing = nil) ⇒ Object
Define what to do after indexing.
-
#backend(backend = nil) ⇒ Object
API method.
-
#category(category_name, options = {}) ⇒ Object
API method.
-
#check_source_empty ⇒ Object
Check if the given enumerable source is empty.
-
#directory ⇒ Object
The directory used by this index.
-
#facets(category_identifier, options = {}) ⇒ Object
Return facets for a category in the form: { text => count }.
-
#geo_categories(lat_name, lng_name, radius, options = {}) ⇒ Object
Geo search, searches in a rectangle (almost square) in the lat/long coordinate system.
-
#id(name = nil, options = {}) ⇒ Object
API method.
-
#identifier ⇒ Object
Identifier used for technical output.
-
#indexing(options = {}) ⇒ Object
Define an index tokenizer on the index.
-
#initialize(name) ⇒ Index
constructor
Create a new index with a given source.
-
#key_format(key_format = nil) ⇒ Object
Define a key_format on the index.
-
#only(*qualifiers) ⇒ Object
Restrict categories to the given ones.
-
#optimize(*hints) ⇒ Object
Provide hints for Picky so it can optimise.
-
#optimize_memory(array_references = Hash.new) ⇒ Object
Explicitly trigger memory optimization.
-
#prepare(scheduler = Scheduler.new) ⇒ Object
Calling prepare on an index will call prepare on every category.
-
#prepare_in_parallel(scheduler) ⇒ Object
Indexes the categories in parallel.
-
#ranged_category(category_name, range, options = {}) ⇒ Object
Make this category range searchable with a fixed range.
-
#result_identifier(result_identifier = nil) ⇒ Object
Define how the results of this index are identified.
-
#source(some_source = nil, &block) ⇒ Object
Define a source on the index.
-
#static ⇒ Object
TODO Doc.
- #static? ⇒ Boolean
-
#symbol_keys(value = nil) ⇒ Object
API method.
- #to_s ⇒ Object
- #to_stats ⇒ Object
-
#to_tree_s(indent = 0) ⇒ Object
Displays the structure as a tree.
-
#tokenizer ⇒ Object
Returns the installed tokenizer or the default.
-
#unblock_source ⇒ Object
Get the actual source if it is wrapped in a time capsule, ie.
-
#unshift(thing) ⇒ Object
Add at the beginning (calls add).
-
#with_data_snapshot ⇒ Object
Note: Duplicated in category_indexing.rb.
Methods included from Helpers::Indexing
Methods included from Helpers::Measuring
Constructor Details
#initialize(name) ⇒ Index
Create a new index with a given source.
Parameters
-
name: A name that will be used for the index directory and in the Picky front end.
Options (all are used in the block - not passed as a Hash, see examples)
-
source: Where the data comes from, e.g. Sources::CSV.new(…). Optional, can be defined in the block using #source.
-
result_identifier: Use if you’d like a different identifier/name in the results than the name of the index.
-
after_indexing: As of this writing only used in the db source. Executes the given after_indexing as SQL after the indexing process.
-
indexing: Call and pass either a tokenizer (responds to #tokenize) or the options for a tokenizer..
-
key_format: Call and pass in a format method for the ids (default is #to_i).
Example:
my_index = Index.new(:my_index) do
source Sources::CSV.new(file: 'data/index.csv')
key_format :to_sym
category :bla
result_identifier :my_special_results
end
120 121 122 123 124 125 126 127 128 129 |
# File 'lib/picky/index.rb', line 120 def initialize name @name = name.intern @categories = Categories.new # Centralized registry. # Indexes.register self instance_eval(&Proc.new) if block_given? end |
Instance Attribute Details
#categories ⇒ Object (readonly)
Returns the value of attribute categories.
89 90 91 |
# File 'lib/picky/index.rb', line 89 def categories @categories end |
#hints ⇒ Object (readonly)
Returns the value of attribute hints.
89 90 91 |
# File 'lib/picky/index.rb', line 89 def hints @hints end |
#name ⇒ Object (readonly)
Returns the value of attribute name.
89 90 91 |
# File 'lib/picky/index.rb', line 89 def name @name end |
Instance Method Details
#<<(thing) ⇒ Object
Add at the end.
18 19 20 |
# File 'lib/picky/index_realtime.rb', line 18 def << thing add thing, method: __method__ end |
#add(thing, method: :unshift, force_update: false) ⇒ Object
Add to the index using unshift.
30 31 32 |
# File 'lib/picky/index_realtime.rb', line 30 def add thing, method: :unshift, force_update: false categories.add thing, method: method, force_update: force_update end |
#after_indexing(after_indexing = nil) ⇒ Object
Define what to do after indexing. (Only used in the Sources::DB)
117 118 119 |
# File 'lib/picky/index_indexing.rb', line 117 def after_indexing after_indexing = nil after_indexing ? (@after_indexing = after_indexing) : @after_indexing end |
#backend(backend = nil) ⇒ Object
API method.
Sets/returns the backend used. Default is @Backends::Memory.new@.
160 161 162 163 164 165 166 167 |
# File 'lib/picky/index.rb', line 160 def backend backend = nil if backend @backend = backend reset_backend else @backend ||= Backends::Memory.new end end |
#category(category_name, options = {}) ⇒ Object
API method.
Defines a searchable category on the index.
Parameters
-
category_name: This identifier is used in the front end, but also to categorize query text. For example, “title:hobbit” will narrow the hobbit query on categories with the identifier :title.
Options
-
indexing: Pass in either a tokenizer or tokenizer options.
-
partial: Partial::None.new or Partial::Substring.new(from: starting_char, to: ending_char). Default is Partial::Substring.new(from: -3, to: -1).
-
similarity: Similarity::None.new or Similarity::DoubleMetaphone.new(similar_words_searched). Default is Similarity::None.new.
-
qualifiers: An array of qualifiers with which you can define which category you’d like to search, for example “title:hobbit” will search for hobbit in just title categories. Example: qualifiers: [:t, :titre, :title] (use it for example with multiple languages). Default is the name of the category.
-
qualifier: Convenience options if you just need a single qualifier, see above. Example: qualifiers => :title. Default is the name of the category.
-
source: Use a different source than the index uses. If you think you need that, there might be a better solution to your problem. Please post to the mailing list first with your application.rb :)
-
from: Take the data from the data category with this name. Example: You have a source Sources::CSV.new(:title, file:‘some_file.csv’) but you want the category to be called differently. The you use from: category(:similar_title, :from => :title).
236 237 238 239 240 241 242 243 |
# File 'lib/picky/index.rb', line 236 def category category_name, = {} new_category = Category.new category_name.intern, self, categories << new_category new_category = yield new_category if block_given? new_category end |
#check_source_empty ⇒ Object
Check if the given enumerable source is empty.
Note: Checking as early as possible to tell the
user as early as possible.
42 43 44 |
# File 'lib/picky/index_indexing.rb', line 42 def check_source_empty Picky.logger.warn %Q{\n\033[1mWarning\033[m, source for index "#{name}" is empty: #{source} (responds true to empty?).\n} if source.respond_to?(:empty?) && source.empty? end |
#directory ⇒ Object
The directory used by this index.
Note: Used @directory ||=, but needs to be dynamic.
183 184 185 |
# File 'lib/picky/index.rb', line 183 def directory ::File.join(Picky.root, 'index', PICKY_ENVIRONMENT, name.to_s) end |
#facets(category_identifier, options = {}) ⇒ Object
Return facets for a category in the form:
{ text => count }
Options
counts: Whether you want counts or not.
at_least: A minimum count a facet needs to have (inclusive).
TODO Think about having a separate index for counts to reduce the complexity of this.
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
# File 'lib/picky/index_facets.rb', line 14 def facets category_identifier, = {} text_ids = self[category_identifier].exact.inverted no_counts = [:counts] == false minimal_counts = [:at_least] if no_counts text_ids.inject([]) do |result, (text, ids)| next result if minimal_counts && ids.size < minimal_counts result << text end else text_ids.inject({}) do |result, (text, ids)| size = ids.size next result if minimal_counts && size < minimal_counts result[text] = size; result end end end |
#geo_categories(lat_name, lng_name, radius, options = {}) ⇒ Object
Geo search, searches in a rectangle (almost square) in the lat/long coordinate system.
Note: It uses #ranged_category.
Parameters:
-
lat_name: The latitude’s name as used in #category.
-
lng_name: The longitude’s name as used in #category.
-
radius: The distance (in km) around the query point which we search for results.
Note: Picky uses a square, not a circle. That should be ok for most usages.
-----------------------------
| |
| |
| |
| |
| |
| *<- radius ->|
| |
| |
| |
| |
| |
-----------------------------
Options
-
precision: Default 1 (20% error margin, very fast), up to 5 (5% error margin, slower) makes sense.
-
lat_from: The data category to take the data for the latitude from.
-
lng_from: The data category to take the data for the longitude from.
THINK Will have to write a wrapper that combines two categories that are indexed simultaneously, since lat/lng are correlated.
355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 |
# File 'lib/picky/index.rb', line 355 def geo_categories lat_name, lng_name, radius, = {} # Extract lat/lng specific options. # lat_from = .delete :lat_from lng_from = .delete :lng_from # One can be a normal ranged_category. # ranged_category lat_name, radius*0.00898312, .merge(from: lat_from) # The other needs to adapt the radius depending on the one. # # Depending on the latitude, the radius of the longitude # needs to enlarge, the closer we get to the pole. # # In our simplified case, the radius is given as if all the # locations were on the 45 degree line. # # This calculates km -> longitude (degrees). # # A degree on the 45 degree line is equal to ~222.6398 km. # So a km on the 45 degree line is equal to 0.01796624 degrees. # ranged_category lng_name, radius*0.01796624, .merge(from: lng_from) end |
#id(name = nil, options = {}) ⇒ Object
API method.
Defines the name of the ID method to use on the indexed object.
Parameters
-
name: Method name of the ID.
99 100 101 102 |
# File 'lib/picky/index_indexing.rb', line 99 def id name = nil, = {} key_format [:format] @id_name = name || @id_name || :id end |
#identifier ⇒ Object
Identifier used for technical output.
394 395 396 |
# File 'lib/picky/index.rb', line 394 def identifier name end |
#indexing(options = {}) ⇒ Object
Define an index tokenizer on the index.
Parameters are the exact same as for indexing.
16 17 18 |
# File 'lib/picky/index_indexing.rb', line 16 def indexing = {} @tokenizer = Tokenizer.from end |
#key_format(key_format = nil) ⇒ Object
Define a key_format on the index.
Parameter is a method name to use on the key (e.g. :to_i, :to_s, :strip, :split).
TODO Rename to id_format.
110 111 112 |
# File 'lib/picky/index_indexing.rb', line 110 def key_format key_format = nil key_format ? (@key_format = key_format) : @key_format end |
#only(*qualifiers) ⇒ Object
Restrict categories to the given ones.
Functionally equivalent as if indexes didn’t have the categories at all.
Note: Probably only makes sense when an index is used in multiple searches. If not, why even have the categories?
TODO Redesign.
198 199 200 201 |
# File 'lib/picky/index.rb', line 198 def only *qualifiers raise "Sorry, Picky::Search#only has been removed in version." # @qualifier_mapper.restrict_to *qualifiers end |
#optimize(*hints) ⇒ Object
Provide hints for Picky so it can optimise.
133 134 135 136 |
# File 'lib/picky/index.rb', line 133 def optimize *hints require_relative 'index/hints' @hints = Hints.new hints end |
#optimize_memory(array_references = Hash.new) ⇒ Object
Explicitly trigger memory optimization.
140 141 142 143 144 |
# File 'lib/picky/index.rb', line 140 def optimize_memory array_references = Hash.new dedup = Picky::Optimizers::Memory::ArrayDeduplicator.new dedup.deduplicate categories.map(&:exact).map(&:inverted), array_references dedup.deduplicate categories.map(&:partial).map(&:inverted), array_references end |
#prepare(scheduler = Scheduler.new) ⇒ Object
Calling prepare on an index will call prepare on every category.
Decides whether to use a parallel indexer or whether to forward to each category to prepare themselves.
TODO Do a critical reading of this on the blog.
28 29 30 31 32 33 34 35 |
# File 'lib/picky/index_indexing.rb', line 28 def prepare scheduler = Scheduler.new if source.respond_to?(:each) check_source_empty prepare_in_parallel scheduler else with_data_snapshot { categories.prepare scheduler } end end |
#prepare_in_parallel(scheduler) ⇒ Object
Indexes the categories in parallel.
Only use where the category does have a #each source defined.
50 51 52 53 |
# File 'lib/picky/index_indexing.rb', line 50 def prepare_in_parallel scheduler indexer = Indexers::Parallel.new self indexer.prepare categories, scheduler end |
#ranged_category(category_name, range, options = {}) ⇒ Object
Make this category range searchable with a fixed range. If you need other ranges, define another category with a different range value.
Example: You have data values inside 1..100, and you want to have Picky return not only the results for 47 if you search for 47, but also results for 45, 46, or 47.2, 48.9, in a range of 2 around 47, so (45..49).
Then you use:
ranged_category :values_inside_1_100, 2
Optionally, you give it a precision value to reduce the error margin around 47 (Picky is a bit liberal).
Index.new :range do
ranged_category :values_inside_1_100, 2, precision: 5
end
This will force Picky to maximally be wrong 5% of the given range value (5% of 2 = 0.1) instead of the default 20% (20% of 2 = 0.4).
We suggest not to use much more than 5 as a higher precision is more performance intensive for less and less precision gain.
Protip 1
Create two ranged categories to make an area search:
Index.new :area do
ranged_category :x, 1
ranged_category :y, 1
end
Search for it using for example:
x:133, y:120
This will search this square area (* = 133, 120: The “search” point entered):
132 134
| |
--|---------|-- 121
| |
| * |
| |
--|---------|-- 119
| |
Note: The area does not need to be square, but can be rectangular.
Protip 2
Create three ranged categories to make a volume search.
Or go crazy and use 4 ranged categories for a space/time search! ;)
Parameters
-
category_name: The category_name as used in #category.
-
range: The range (in the units of your data values) around the query point where we search for results.
-----|<- range ->*------------|-----
Options
-
precision: Default is 1 (20% error margin, very fast), up to 5 (5% error margin, slower) makes sense.
-
anchor: Where to anchor the grid.
-
… all options of #category.
309 310 311 312 313 314 315 316 317 318 319 320 |
# File 'lib/picky/index.rb', line 309 def ranged_category category_name, range, = {} precision = .delete(:precision) || 1 anchor = .delete(:anchor) || 0.0 # Note: :key_format => :to_f ? # = { partial: Partial::None.new }.merge category category_name, do |cat| Category::Location.install_on cat, range, precision, anchor end end |
#result_identifier(result_identifier = nil) ⇒ Object
Define how the results of this index are identified. (Shown in the client, for example)
Default is the name of the index.
17 18 19 |
# File 'lib/picky/index_indexed.rb', line 17 def result_identifier result_identifier = nil result_identifier ? (@result_identifier = result_identifier) : (@result_identifier || @name) end |
#source(some_source = nil, &block) ⇒ Object
Define a source on the index.
Parameter is a source, either one of the standard sources or anything responding to #each and returning objects that respond to id and the category names (or the category from option).
81 82 83 84 |
# File 'lib/picky/index_indexing.rb', line 81 def source some_source = nil, &block some_source ||= block some_source ? (@source = Source.from(some_source, false, name)) : unblock_source end |
#static ⇒ Object
TODO Doc.
148 149 150 |
# File 'lib/picky/index.rb', line 148 def static @static = true end |
#static? ⇒ Boolean
151 152 153 |
# File 'lib/picky/index.rb', line 151 def static? @static end |
#symbol_keys(value = nil) ⇒ Object
API method.
171 172 173 174 175 176 177 |
# File 'lib/picky/index.rb', line 171 def symbol_keys value = nil if value @symbol_keys = value else @symbol_keys end end |
#to_s ⇒ Object
400 401 402 403 404 405 406 407 408 |
# File 'lib/picky/index.rb', line 400 def to_s s = [ name, "result_id: #{result_identifier}", ("source: #{source}" if @source), ("categories: #{categories}" unless categories.empty?) ].compact "#{self.class}(#{s.join(', ')})" end |
#to_stats ⇒ Object
381 382 383 384 385 386 387 388 389 390 |
# File 'lib/picky/index.rb', line 381 def to_stats stats = <<-INDEX #{name} (#{self.class}): #{"source: #{source}".indented_to_s} #{"categories: #{categories.to_stats}".indented_to_s} INDEX stats << "result identifier: \"#{result_identifier}\"".indented_to_s unless result_identifier.to_s == name.to_s stats << "\n" stats end |
#to_tree_s(indent = 0) ⇒ Object
Displays the structure as a tree.
412 413 414 415 416 417 418 419 420 |
# File 'lib/picky/index.rb', line 412 def to_tree_s indent = 0 <<-TREE #{' ' * indent}Index(#{name}) #{' ' * indent} source: #{source.to_s[0..40]} #{' ' * indent} result identifier: "#{result_identifier}" #{' ' * indent} categories: #{' ' * indent}#{categories.to_tree_s(4)} TREE end |
#tokenizer ⇒ Object
Returns the installed tokenizer or the default.
71 72 73 |
# File 'lib/picky/index_indexing.rb', line 71 def tokenizer @tokenizer || Indexes.tokenizer end |
#unblock_source ⇒ Object
Get the actual source if it is wrapped in a time capsule, ie. a block/lambda.
88 89 90 |
# File 'lib/picky/index_indexing.rb', line 88 def unblock_source @source.respond_to?(:call) ? @source.call : @source end |
#unshift(thing) ⇒ Object
Add at the beginning (calls add).
24 25 26 |
# File 'lib/picky/index_realtime.rb', line 24 def unshift thing add thing, method: __method__ end |
#with_data_snapshot ⇒ Object
Note: Duplicated in category_indexing.rb.
Take a data snapshot if the source offers it.
59 60 61 62 63 64 65 66 67 |
# File 'lib/picky/index_indexing.rb', line 59 def with_data_snapshot if source.respond_to? :with_snapshot source.with_snapshot(self) do yield end else yield end end |