Class: String
- Defined in:
- lib/classifier/lsi/summary.rb,
lib/classifier/extensions/word_hash.rb
Overview
These are extensions to the String class to provide convenience methods for the Classifier package.
Instance Method Summary collapse
-
#clean_word_hash ⇒ Object
Return a word hash without extra punctuation or short symbols, just stemmed words.
- #paragraph_summary(count = 1, separator = " [...] ") ⇒ Object
- #split_paragraphs ⇒ Object
- #split_sentences ⇒ Object
- #summary(count = 10, separator = " [...] ") ⇒ Object
-
#without_punctuation ⇒ Object
Removes common punctuation symbols, returning a new string.
-
#word_hash ⇒ Object
Return a Hash of strings => ints.
Instance Method Details
#clean_word_hash ⇒ Object
Return a word hash without extra punctuation or short symbols, just stemmed words
28 29 30 |
# File 'lib/classifier/extensions/word_hash.rb', line 28 def clean_word_hash word_hash_for_words gsub(/[^\w\s]/,"").split end |
#paragraph_summary(count = 1, separator = " [...] ") ⇒ Object
10 11 12 |
# File 'lib/classifier/lsi/summary.rb', line 10 def paragraph_summary( count=1, separator=" [...] " ) perform_lsi split_paragraphs, count, separator end |
#split_paragraphs ⇒ Object
18 19 20 |
# File 'lib/classifier/lsi/summary.rb', line 18 def split_paragraphs split /(\n\n|\r\r|\r\n\r\n)/ # TODO: make this less primitive end |
#split_sentences ⇒ Object
14 15 16 |
# File 'lib/classifier/lsi/summary.rb', line 14 def split_sentences split /(\.|\!|\?)/ # TODO: make this less primitive end |
#summary(count = 10, separator = " [...] ") ⇒ Object
6 7 8 |
# File 'lib/classifier/lsi/summary.rb', line 6 def summary( count=10, separator=" [...] " ) perform_lsi split_sentences, count, separator end |
#without_punctuation ⇒ Object
Removes common punctuation symbols, returning a new string. E.g.,
"Hello (greeting's), with {braces} < >...?".without_punctuation
=> "Hello greetings with braces "
15 16 17 |
# File 'lib/classifier/extensions/word_hash.rb', line 15 def without_punctuation tr( ',?.!;:"@#$%^&*()_=+[]{}\|<>/`~', " " ) .tr( "'\-", "") end |
#word_hash ⇒ Object
Return a Hash of strings => ints. Each word in the string is stemmed, interned, and indexes to its frequency in the document.
21 22 23 24 25 |
# File 'lib/classifier/extensions/word_hash.rb', line 21 def word_hash word_hash = clean_word_hash() symbol_hash = word_hash_for_symbols(gsub(/[\w]/," ").split) return word_hash.merge(symbol_hash) end |