Class: Picky::Splitters::Automatic
- Defined in:
- lib/picky/splitters/automatic.rb
Overview
Automatic Splitter.
Use as a splitter for the splits_text_on option for Searches. You need to give it an index category to use for the splitting.
Example:
Picky::Search.new index do
searching splits_text_on: Picky::Splitters::Automatic.new(index[:name])
end
Will split most queries correctly. However, has the following problems:
* "cannot" is usually split as ['can', 'not']
* "rainbow" is usually split as ['rain', 'bow']
Reference: norvig.com/ngrams/ch14.pdf.
Adapted from a script submitted by Andy Kitchen.
Instance Method Summary collapse
-
#initialize(category, options = {}) ⇒ Automatic
constructor
A new instance of Automatic.
-
#reset_memoization ⇒ Object
Reset the memoization.
- #segment(text, use_partial = false) ⇒ Object
-
#segment_recursively(text, use_partial = false) ⇒ Object
Segments the given text recursively.
-
#split(text) ⇒ Object
Split the given text into its most likely constituents.
-
#splits(text) ⇒ Object
Return all splits of a given string.
Constructor Details
#initialize(category, options = {}) ⇒ Automatic
Returns a new instance of Automatic.
28 29 30 31 32 33 34 35 |
# File 'lib/picky/splitters/automatic.rb', line 28 def initialize category, = {} @category = category @exact = category.exact @partial = category.partial @with_partial = [:partial] reset_memoization end |
Instance Method Details
#reset_memoization ⇒ Object
Reset the memoization.
39 40 41 42 |
# File 'lib/picky/splitters/automatic.rb', line 39 def reset_memoization @exact_memo = Hash.new @partial_memo = Hash.new end |
#segment(text, use_partial = false) ⇒ Object
60 61 62 63 64 |
# File 'lib/picky/splitters/automatic.rb', line 60 def segment text, use_partial = false segments, score = segment_recursively text, use_partial segments.collect!(&:to_s) if @category.symbol_keys? [segments, score && score-text.size+segments.size] end |
#segment_recursively(text, use_partial = false) ⇒ Object
Segments the given text recursively.
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
# File 'lib/picky/splitters/automatic.rb', line 68 def segment_recursively text, use_partial = false text = text.to_sym if @category.symbol_keys? (use_partial ? @partial_memo : @exact_memo)[text] ||= splits(text).inject([[], nil]) do |(current, heaviest), (head, tail)| tail = tail.to_sym if @category.symbol_keys? tail_weight = use_partial ? @partial.weight(tail) : @exact.weight(tail) tail_weight && tail_weight += (tail.size-1) segments, head_weight = segment_recursively head, use_partial weight = (head_weight && tail_weight && (head_weight + tail_weight) || tail_weight || head_weight) if (weight || -1) >= (heaviest || 0) [tail_weight ? segments + [tail] : segments, weight] else [current, heaviest] end end end |
#split(text) ⇒ Object
Split the given text into its most likely constituents.
47 48 49 |
# File 'lib/picky/splitters/automatic.rb', line 47 def split text segment(text, @with_partial).first end |
#splits(text) ⇒ Object
Return all splits of a given string.
53 54 55 56 57 58 |
# File 'lib/picky/splitters/automatic.rb', line 53 def splits text l = text.length (0..l-1).map do |x| [text.slice(0,x), text.slice(x,l)] end end |