Class: CoCoGe::NgramsSymbolGenerator
- Inherits:
-
Object
- Object
- CoCoGe::NgramsSymbolGenerator
- Defined in:
- lib/co_co_ge/ngrams_symbol_generator.rb
Instance Method Summary collapse
- #compute ⇒ Object
-
#initialize(corpus:, length:, filter: -> word { true }, post_process: -> word { word.upcase }) ⇒ NgramsSymbolGenerator
constructor
A new instance of NgramsSymbolGenerator.
Constructor Details
#initialize(corpus:, length:, filter: -> word { true }, post_process: -> word { word.upcase }) ⇒ NgramsSymbolGenerator
Returns a new instance of NgramsSymbolGenerator.
2 3 4 5 6 7 |
# File 'lib/co_co_ge/ngrams_symbol_generator.rb', line 2 def initialize(corpus:, length:, filter: -> word { true }, post_process: -> word { word.upcase }) @corpus = corpus @filter = filter @length = length @post_process = post_process end |
Instance Method Details
#compute ⇒ Object
9 10 11 12 13 14 15 16 17 18 19 |
# File 'lib/co_co_ge/ngrams_symbol_generator.rb', line 9 def compute words = @corpus.split(/\s+/).select(&@filter).map(&@post_process) count = Hash.new(0) words.each do |word| word.each_char.each_cons(@length) do |part| part.size < @length and break count[part.join] += 1 end end count.sort_by { |c| -c.last }.first(255).map(&:first) end |