Class: Picky::Splitters::Automatic

Inherits:
Object
  • Object
show all
Defined in:
lib/picky/splitters/automatic.rb

Overview

Automatic Splitter.

Use as a splitter for the splits_text_on option for Searches. You need to give it an index category to use for the splitting.

Example:

Picky::Search.new index do
  searching splits_text_on: Picky::Splitters::Automatic.new(index[:name])
end

Will split most queries correctly. However, has the following problems:

* "cannot" is usually split as ['can', 'not']
* "rainbow" is usually split as ['rain', 'bow']

Reference: norvig.com/ngrams/ch14.pdf.

Adapted from a script submitted by Andy Kitchen.

Instance Method Summary collapse

Constructor Details

#initialize(category, options = {}) ⇒ Automatic

Returns a new instance of Automatic.



28
29
30
31
32
33
34
35
# File 'lib/picky/splitters/automatic.rb', line 28

def initialize category, options = {}
  @category     = category
  @exact        = category.exact
  @partial      = category.partial
  @with_partial = options[:partial]
  
  reset_memoization
end

Instance Method Details

#reset_memoizationObject

Reset the memoization.



39
40
41
42
# File 'lib/picky/splitters/automatic.rb', line 39

def reset_memoization
  @exact_memo = Hash.new
  @partial_memo = Hash.new
end

#segment(text, use_partial = false) ⇒ Object



60
61
62
63
64
# File 'lib/picky/splitters/automatic.rb', line 60

def segment text, use_partial = false
  segments, score = segment_recursively text, use_partial
  segments.collect!(&:to_s) if @category.symbol_keys?
  [segments, score && score-text.size+segments.size]
end

#segment_recursively(text, use_partial = false) ⇒ Object

Segments the given text recursively.



68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
# File 'lib/picky/splitters/automatic.rb', line 68

def segment_recursively text, use_partial = false
  text = text.to_sym if @category.symbol_keys?
  (use_partial ? @partial_memo : @exact_memo)[text] ||= splits(text).inject([[], nil]) do |(current, heaviest), (head, tail)|
    tail = tail.to_sym if @category.symbol_keys?
    tail_weight = use_partial ? @partial.weight(tail) : @exact.weight(tail)
    tail_weight && tail_weight += (tail.size-1)
    
    segments, head_weight = segment_recursively head, use_partial
    
    weight = (head_weight && tail_weight &&
             (head_weight + tail_weight) ||
             tail_weight || head_weight)
             
    if (weight || -1) >= (heaviest || 0)
      [tail_weight ? segments + [tail] : segments, weight]
    else
      [current, heaviest]
    end
  end
end

#split(text) ⇒ Object

Split the given text into its most likely constituents.



47
48
49
# File 'lib/picky/splitters/automatic.rb', line 47

def split text
  segment(text, @with_partial).first
end

#splits(text) ⇒ Object

Return all splits of a given string.



53
54
55
56
57
58
# File 'lib/picky/splitters/automatic.rb', line 53

def splits text
  l = text.length
  (0..l-1).map do |x|
    [text.slice(0,x), text.slice(x,l)]
  end
end