Class: Picky::Query::Token

Inherits:
Object show all
Defined in:
lib/picky/query/token.rb

Overview

This is a query token. Together with other tokens it makes up a query.

It remembers the original form, and and a normalized form.

It also knows whether it needs to look for similarity (bla~), or whether it is a partial (bla*).

Constant Summary collapse

@@no_partial_character =

If the text ends with *, partialize it. If with “, non-partialize it.

The last one wins. So “hello*” will not be partially searched. So “hello”* will be partially searched.

'"'
@@partial_character =
'*'
@@no_partial =
/\"\z/
@@partial =
/\*\z/
@@no_similar_character =

If the text ends with ~ similarize it. If with “, don’t.

The latter wins.

'"'
@@similar_character =
'~'
@@no_similar =
%r{#@@no_similar_character\z}
@@similar =
%r{#@@similar_character\z}
@@range_character =

Define a character which makes a token a range token.

Default is ‘…’.

Example:

Picky::Query::Token.range_character = "-"
try.search("year:2000-2008") # Will find results in a range.
?…
@@qualifier_text_delimiter =

Splits text into a qualifier and text.

/:/
@@qualifiers_delimiter =
/,/
@@qualifier_text_splitter =

TODO Think about making these instances.

Splitter.new @@qualifier_text_delimiter
@@qualifiers_splitter =
Splitter.new @@qualifiers_delimiter

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(text, original = nil, categories = nil) ⇒ Token

Normal initializer.

Note: Use this if you do not want a normalized token.

TODO Throw away @predefined_categories?



27
28
29
30
31
# File 'lib/picky/query/token.rb', line 27

def initialize text, original = nil, categories = nil
  @text     = text
  @original = original
  @predefined_categories = categories
end

Instance Attribute Details

#originalObject

Returns the value of attribute original.



14
15
16
# File 'lib/picky/query/token.rb', line 14

def original
  @original
end

#predefined_categories(mapper = nil) ⇒ Object

Translates this token’s qualifiers into actual categories.

Note: If this is not done, there is no mapping. Note: predefined is an Array of mapped categories.

TODO Do we really need to set the predefined categories on the token?



67
68
69
# File 'lib/picky/query/token.rb', line 67

def predefined_categories mapper = nil
  @predefined_categories || mapper && extract_predefined(mapper)
end

#similar=(value) ⇒ Object (writeonly)

Sets the attribute similar

Parameters:

  • value

    the value to set the attribute similar to.



15
16
17
# File 'lib/picky/query/token.rb', line 15

def similar=(value)
  @similar = value
end

#textObject

Returns the value of attribute text.



14
15
16
# File 'lib/picky/query/token.rb', line 14

def text
  @text
end

Class Method Details

.no_partial_character=(character) ⇒ Object

Define a character which stops a token from being a partial token, even if it is the last token.

Default is ‘“’.

This is used in a regexp (%r#{charz}) for String#=~, so escape the character.

Example:

Picky::Query::Token.no_partial_character = '\?'
try.search("tes?") # Won't find "test".


155
156
157
158
159
# File 'lib/picky/query/token.rb', line 155

def self.no_partial_character= character
  @@no_partial_character = character
  @@no_partial = %r{#{character}\z}
  redefine_illegals
end

.no_similar_character=(character) ⇒ Object

Define a character which stops a token from being a similar token, even if it is the last token.

Default is ‘“’.

This is used in a regexp (%r#{charz}) for String#=~, so escape the character.

Example:

Picky::Query::Token.no_similar_character = '\?'
try.search("tost?") # Won't find "test".


201
202
203
204
205
# File 'lib/picky/query/token.rb', line 201

def self.no_similar_character= character
  @@no_similar_character = character
  @@no_similar = %r{#{character}\z}
  redefine_illegals
end

.partial_character=(character) ⇒ Object

Define a character which makes a token a partial token.

Default is ‘*’.

This is used in a regexp (%r#{charz}) for String#=~, so escape the character.

Example:

Picky::Query::Token.partial_character = '\?'
try.search("tes?") # Will find "test".


171
172
173
174
175
# File 'lib/picky/query/token.rb', line 171

def self.partial_character= character
  @@partial_character = character
  @@partial = %r{#{character}\z}
  redefine_illegals
end

.processed(text, original = nil) ⇒ Object

Returns a qualified and normalized token.

Note: Use this in the search engine if you need a qualified and normalized token. I.e. one prepared for a search.



39
40
41
# File 'lib/picky/query/token.rb', line 39

def self.processed text, original = nil
  new(text, original).process
end

.qualifier_text_delimiter=(character) ⇒ Object

Define a regexp which separates the qualifier from the search text.

Default is /:/.

Example:

Picky::Query::Token.qualifier_text_delimiter = /\?/
try.search("text1?hello text2?world").ids.should == [1]


314
315
316
317
# File 'lib/picky/query/token.rb', line 314

def self.qualifier_text_delimiter= character
  @@qualifier_text_delimiter = character
  @@qualifier_text_splitter  = Splitter.new @@qualifier_text_delimiter
end

.qualifiers_delimiter=(character) ⇒ Object

Define a regexp which separates the qualifiers (before the search text).

Default is /,/.

Example:

Picky::Query::Token.qualifiers_delimiter = /|/
try.search("text1|text2:hello").ids.should == [1]


328
329
330
331
# File 'lib/picky/query/token.rb', line 328

def self.qualifiers_delimiter= character
  @@qualifiers_delimiter = character
  @@qualifiers_splitter  = Splitter.new @@qualifiers_delimiter
end

.range_character=(character) ⇒ Object



232
233
234
# File 'lib/picky/query/token.rb', line 232

def self.range_character= character
  @@range_character = character
end

.redefine_illegalsObject



255
256
257
258
259
# File 'lib/picky/query/token.rb', line 255

def self.redefine_illegals
  # Note: By default, both no similar and no partial are ".
  #
  @@illegals = %r{[#@@no_similar_character#@@similar_character#@@no_partial_character#@@partial_character]}
end

.similar_character=(character) ⇒ Object

Define a character which makes a token a similar token.

Default is ‘~’.

This is used in a regexp (%r#{charz}) for String#=~, so escape the character.

Example:

Picky::Query::Token.similar_character = '\?'
try.search("tost?") # Will find "test".


217
218
219
220
221
# File 'lib/picky/query/token.rb', line 217

def self.similar_character= character
  @@similar_character = character
  @@similar = %r{#{character}\z}
  redefine_illegals
end

Instance Method Details

#==(other) ⇒ Object

If the originals & the text are the same, they are the same.



362
363
364
# File 'lib/picky/query/token.rb', line 362

def == other
  self.original == other.original && self.text == other.text
end

#categorize_with(mapper, qualifiers) ⇒ Object



74
75
76
77
78
# File 'lib/picky/query/token.rb', line 74

def categorize_with mapper, qualifiers
  qualifiers && qualifiers.map do |qualifier|
    mapper.map qualifier
  end.compact
end

#combination_for(category) ⇒ Object

If the Token has weight for the given category, it will return a new combination for the tuple (self, category, weight).



276
277
278
279
# File 'lib/picky/query/token.rb', line 276

def combination_for category
  weight = category.weight self
  weight && Query::Combination.new(self, category, weight)
end

#extract_predefined(mapper) ⇒ Object



70
71
72
73
# File 'lib/picky/query/token.rb', line 70

def extract_predefined mapper
  user_qualified = categorize_with mapper, @qualifiers
  mapper.restrict user_qualified
end

#identifierObject

Internal identifier.

Note: Used in many backends.



356
357
358
# File 'lib/picky/query/token.rb', line 356

def identifier
  "#{similar?? :similarity : :inverted}:#@text"
end

#partial=(partial) ⇒ Object

Partial is a conditional setter.

It is only settable if it hasn’t been set yet.



106
107
108
# File 'lib/picky/query/token.rb', line 106

def partial= partial
  @partial = partial if @partial.nil?
end

#partial?Boolean

A token is partial? only if it not similar and is partial.

It can’t be similar and partial at the same time.

Note: @partial is calculated at processing time (see Token#process).

Returns:

  • (Boolean)


117
118
119
120
# File 'lib/picky/query/token.rb', line 117

def partial?
  # Was: !@similar && @partial
  @partial
end

#partializeObject



133
134
135
136
137
138
139
140
141
142
# File 'lib/picky/query/token.rb', line 133

def partialize
  # A token is partial? only if it not similar
  # and is partial.
  #
  # It can't be similar and partial at the same time.
  #
  self.partial = false or return if @similar
  self.partial = false or return if @text =~ @@no_partial
  self.partial = true if @text =~ @@partial
end

#possible_combinations(categories) ⇒ Object

Return all possible combinations.

This checks if it needs to also search through similar tokens, if for example, the token is one with ~. If yes, it puts together all solutions.



268
269
270
# File 'lib/picky/query/token.rb', line 268

def possible_combinations categories
  similar? ? categories.similar_possible_for(self) : categories.possible_for(self)
end

#processObject



42
43
44
45
46
47
48
49
# File 'lib/picky/query/token.rb', line 42

def process
  qualify
  similarize
  partialize
  rangify
  remove_illegals
  self
end

#qualifiersObject

Returns the qualifiers as an array.

Example:

token.qualifiers # => ['title', 'author']
token.qualifiers # => []

Note: Internally, qualifiers are nil if there are none.



341
342
343
# File 'lib/picky/query/token.rb', line 341

def qualifiers
  @qualifiers || []
end

#qualifyObject



299
300
301
302
303
304
# File 'lib/picky/query/token.rb', line 299

def qualify
  @qualifiers, @text = @@qualifier_text_splitter.single @text
  if @qualifiers
    @qualifiers = @@qualifiers_splitter.multi @qualifiers
  end
end

#rangeObject



238
239
240
# File 'lib/picky/query/token.rb', line 238

def range
  @range
end

#rangifyObject



235
236
237
# File 'lib/picky/query/token.rb', line 235

def rangify
  @range = @text.split(@@range_character, 2) if @text.include? @@range_character
end

#remove_illegalsObject

Normalizes this token’s text.



250
251
252
253
254
# File 'lib/picky/query/token.rb', line 250

def remove_illegals
  # Note: unless @text.blank? was removed.
  #
  @text.gsub! @@illegals, EMPTY_STRING unless @text == EMPTY_STRING
end

#select_bundle(exact, partial) ⇒ Object

Selects the bundle to be used.



82
83
84
# File 'lib/picky/query/token.rb', line 82

def select_bundle exact, partial
  @partial ? partial : exact
end

#similar?Boolean

Is this a “similar” character?

Returns:

  • (Boolean)


244
245
246
# File 'lib/picky/query/token.rb', line 244

def similar?
  @similar
end

#similar_tokens_for(category) ⇒ Object

Returns all similar tokens for the token.



283
284
285
286
287
288
289
290
# File 'lib/picky/query/token.rb', line 283

def similar_tokens_for category
  similars = category.similar self
  similars.map do |similar|
    # The array describes all possible categories. There is only one here.
    #
    self.class.new similar, similar, [category]
  end
end

#similarizeObject



185
186
187
188
# File 'lib/picky/query/token.rb', line 185

def similarize
  self.similar = false or return if @text =~ @@no_similar
  self.similar = true if @text =~ @@similar
end

#stem(tokenizer) ⇒ Object

Generates a reused stem.

Caches a stem for a tokenizer.



90
91
92
93
94
95
96
97
# File 'lib/picky/query/token.rb', line 90

def stem tokenizer
  if stem?
    @stems ||= Hash.new
    @stems[tokenizer] ||= tokenizer.stem(@text)
  else
    @text
  end
end

#stem?Boolean

Returns:

  • (Boolean)


98
99
100
# File 'lib/picky/query/token.rb', line 98

def stem?
  @text !~ @@no_partial
end

#symbolize!Object

Symbolizes this token’s text.

Note: Call externally when Picky operates in Symbols mode.



56
57
58
# File 'lib/picky/query/token.rb', line 56

def symbolize!
  @text = @text.to_sym
end

#to_resultObject

Returns the token in the form

['original:Text', 'processedtext']


348
349
350
# File 'lib/picky/query/token.rb', line 348

def to_result
  [@original, @text]
end

#to_sObject

Displays the text and the qualifiers.

e.g. name:meier



370
371
372
# File 'lib/picky/query/token.rb', line 370

def to_s
  "#{self.class}(#{[@text, (@qualifiers.inspect unless @qualifiers.blank?)].compact.join(', ')})"
end