Class: CoCoGe::NgramsSymbolGenerator

Inherits:
Object
  • Object
show all
Defined in:
lib/co_co_ge/ngrams_symbol_generator.rb

Instance Method Summary collapse

Constructor Details

#initialize(corpus:, length:, filter: -> word { true }, post_process: -> word { word.upcase }) ⇒ NgramsSymbolGenerator

Returns a new instance of NgramsSymbolGenerator.



2
3
4
5
6
7
# File 'lib/co_co_ge/ngrams_symbol_generator.rb', line 2

def initialize(corpus:, length:, filter: -> word { true }, post_process: -> word { word.upcase })
  @corpus       = corpus
  @filter       = filter
  @length       = length
  @post_process = post_process
end

Instance Method Details

#computeObject



9
10
11
12
13
14
15
16
17
18
19
# File 'lib/co_co_ge/ngrams_symbol_generator.rb', line 9

def compute
  words = @corpus.split(/\s+/).select(&@filter).map(&@post_process)
  count = Hash.new(0)
  words.each do |word|
    word.each_char.each_cons(@length) do |part|
      part.size < @length and break
      count[part.join] += 1
    end
  end
  count.sort_by { |c| -c.last }.first(255).map(&:first)
end