Class: SplitSentence
- Inherits:
-
Object
- Object
- SplitSentence
- Defined in:
- lib/markovite/splitter.rb
Overview
class that takes a corpus and breaks it down into arrays. each array is one sentence.
Constant Summary collapse
- ENDERS =
['?', '.', '!']
- ABBREVIATIONS =
[ 'ave.','blvd.','ln','rd.','st.', #directional 'tsp.','t.', 'tbs.', 'tbsp.','gal.','lb.','pt.','qt.', #cooking "ak.", "al.", "ar.", "az.", "ca.", "co.", "ct.", "dc.", "de.", "fl.", "ga.", "gu.", "hi.", "ia.", "id.", "il.", "in.", "ks.", "ky.", "la.", "ma.", "md.", "me.", "mh.", "mi.", "mn.", "mo.", "ms.", "mt.", "nc.", "nd.", "ne.", "nh.", "nj.", "nm.", "nv.", "ny.", "oh.", "ok.", "or.", "pa.", "pr.", "pw.", "ri.", "sc.", "sd.", "tn.", "tx.", "ut.", "va.", "vi.", "vt.", "wa.", "wi.", "wv.", "wy.", "u.s.", "u.s.a,", #us locations "dr.", "esq.", "jr.", "mr.", "mrs.", "ms.", "mx.", "prof.", "rev.", "rt. hon.", "sr.", "st." #personal ]
Instance Attribute Summary collapse
-
#corpus ⇒ Object
Returns the value of attribute corpus.
Instance Method Summary collapse
- #expand_corpus(text) ⇒ Object
-
#initialize(corpus = nil) ⇒ SplitSentence
constructor
A new instance of SplitSentence.
-
#split_text(new_text = nil) ⇒ Object
might be cool to count punct.
Constructor Details
#initialize(corpus = nil) ⇒ SplitSentence
Returns a new instance of SplitSentence.
22 23 24 |
# File 'lib/markovite/splitter.rb', line 22 def initialize(corpus = nil) self.corpus = corpus || "" end |
Instance Attribute Details
#corpus ⇒ Object
Returns the value of attribute corpus.
20 21 22 |
# File 'lib/markovite/splitter.rb', line 20 def corpus @corpus end |
Instance Method Details
#expand_corpus(text) ⇒ Object
61 62 63 |
# File 'lib/markovite/splitter.rb', line 61 def (text) self.corpus += " #{text}" end |
#split_text(new_text = nil) ⇒ Object
might be cool to count punct. separately, we can point to punct as a way to indicate the end. if the sentences are delimited by n, we can have nil be the value it points to instead. This way, we can impose grammatical rules by making the first word of the sentence capitalized, and the end of the sentence will end with some sort of punctuation.
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
# File 'lib/markovite/splitter.rb', line 39 def split_text(new_text = nil) current_sentence = [] sentences = [] new_text = new_text || corpus all_words = split_words(new_text) all_words.each do |word| if is_end_of_sentence?(word) sentences << add_sentence(current_sentence, word) current_sentence.clear elsif has_newline?(word) newline_words = split_newline(word) sentences << add_sentence(current_sentence, newline_words[0]) current_sentence.clear current_sentence << newline_words[1] else current_sentence << word end end sentences << add_sentence(current_sentence, nil) if !current_sentence.empty? sentences end |