Module: RMMSeg::Chunk

Defined in:
lib/rmmseg/chunk.rb

Overview

A Chunk holds one or more successive Word .

Class Method Summary collapse

Class Method Details

.average_length(words) ⇒ Object

The average length of words.



15
16
17
# File 'lib/rmmseg/chunk.rb', line 15

def self.average_length(words)
  total_length(words).to_f/words.size
end

.degree_of_morphemic_freedom(words) ⇒ Object

The sum of all frequencies of one-character words.



31
32
33
34
35
36
37
38
39
# File 'lib/rmmseg/chunk.rb', line 31

def self.degree_of_morphemic_freedom(words)
  sum = 0
  for word in words
    if word.length == 1 && word.type == Word::TYPES[:cjk_word]
      sum += word.frequency
    end
  end
  sum
end

.total_length(words) ⇒ Object

The sum of length of all words.



6
7
8
9
10
11
12
# File 'lib/rmmseg/chunk.rb', line 6

def self.total_length(words)
  len = 0
  for word in words
    len += word.length
  end
  len
end

.variance(words) ⇒ Object

The square of the standard deviation of length of all words.



20
21
22
23
24
25
26
27
28
# File 'lib/rmmseg/chunk.rb', line 20

def self.variance(words)
  avglen = average_length(words)
  sqr_sum = 0.0
  for word in words
    tmp = word.length - avglen
    sqr_sum += tmp*tmp
  end
  Math.sqrt(sqr_sum)
end