Class: Ebooks::SuffixGenerator

Inherits:

Object

Object
Ebooks::SuffixGenerator

Defined in:: lib/twitter_ebooks/suffix.rb

Overview

This generator uses data similar to a Markov model, but instead of making a chain by looking up bigrams it uses the positions to randomly replace token array suffixes in one sentence with matching suffixes in another

Class Method Summary collapse

.build(sentences) ⇒ SuffixGenerator

Build a generator from a corpus of tikified sentences “tikis” are token indexes– a way of representing words and punctuation as their integer position in a big array of such tokens.

Instance Method Summary collapse

#generate(passes = 5, n = :unigrams) ⇒ Array<Integer>

Generate a recombined sequence of tikis.
#initialize(sentences) ⇒ SuffixGenerator constructor

A new instance of SuffixGenerator.

Constructor Details

#initialize(sentences) ⇒ `SuffixGenerator`

Returns a new instance of SuffixGenerator.

# File 'lib/twitter_ebooks/suffix.rb', line 19

def initialize(sentences)
  @sentences = sentences.reject { |s| s.empty? }
  @unigrams = {}
  @bigrams = {}

  @sentences.each_with_index do |tikis, i|
    if (i % 10000 == 0) then
      log ("Building: sentence #{i} of #{sentences.length}")
    end
    last_tiki = INTERIM
    tikis.each_with_index do |tiki, j|
      @unigrams[last_tiki] ||= []
      @unigrams[last_tiki] << [i, j]

      @bigrams[last_tiki] ||= {}
      @bigrams[last_tiki][tiki] ||= []

      if j == tikis.length-1 # Mark sentence endings
        @unigrams[tiki] ||= []
        @unigrams[tiki] << [i, INTERIM]
        @bigrams[last_tiki][tiki] << [i, INTERIM]
      else
        @bigrams[last_tiki][tiki] << [i, j+1]
      end

      last_tiki = tiki
    end
  end

  self
end

Class Method Details

.build(sentences) ⇒ `SuffixGenerator`

Build a generator from a corpus of tikified sentences “tikis” are token indexes– a way of representing words and punctuation as their integer position in a big array of such tokens

Parameters:

sentences (Array<Array<Integer>>)

Returns:

(SuffixGenerator)



15
16
17

# File 'lib/twitter_ebooks/suffix.rb', line 15

def self.build(sentences)
  SuffixGenerator.new(sentences)
end

Instance Method Details

#generate(passes = 5, n = :unigrams) ⇒ `Array<Integer>`

Generate a recombined sequence of tikis