Class: RMMSeg::SimpleAlgorithm

Inherits:
Object
  • Object
show all
Includes:
Algorithm
Defined in:
lib/rmmseg/simple_algorithm.rb

Constant Summary

Constants included from Algorithm

Algorithm::NONWORD_CHAR_RE

Instance Method Summary collapse

Methods included from Algorithm

#basic_latin?, #find_match_words, #get_basic_latin_word, #next_token, #nonword_char?, #segment

Constructor Details

#initialize(text, token = Token) ⇒ SimpleAlgorithm

Create a new SimpleAlgorithm . The only rule used by this algorithm is MMRule .



10
11
12
# File 'lib/rmmseg/simple_algorithm.rb', line 10

def initialize(text, token=Token)
  super
end

Instance Method Details

#get_cjk_wordObject

Get the most proper CJK word.



15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# File 'lib/rmmseg/simple_algorithm.rb', line 15

def get_cjk_word
  dic = Dictionary.instance
  i = Config.max_word_length
  if i + @index > @chars.length
    i = @chars.length - @index
  end
  chars = @chars[@index, i]
  word = chars.join

  while i > 1 && !dic.has_word?(word)
    i -= 1
    word.slice!(-chars[i].size,chars[i].size) # truncate last char
  end

  token = @token.new(word, @byte_index, @byte_index+word.size)

  @index += i
  @byte_index += word.size

  return token
end