Class: String

Inherits:
Object show all
Defined in:
lib/classifier/lsi/summary.rb,
lib/classifier/extensions/word_hash.rb

Overview

These are extensions to the String class to provide convenience methods for the Classifier package.

Instance Method Summary collapse

Instance Method Details

#clean_word_hashObject

Return a word hash without extra punctuation or short symbols, just stemmed words



28
29
30
# File 'lib/classifier/extensions/word_hash.rb', line 28

def clean_word_hash
	word_hash_for_words gsub(/[^\w\s]/,"").split
end

#paragraph_summary(count = 1, separator = " [...] ") ⇒ Object



10
11
12
# File 'lib/classifier/lsi/summary.rb', line 10

def paragraph_summary( count=1, separator=" [...] " )
   perform_lsi split_paragraphs, count, separator
end

#split_paragraphsObject



18
19
20
# File 'lib/classifier/lsi/summary.rb', line 18

def split_paragraphs
   split /(\n\n|\r\r|\r\n\r\n)/ # TODO: make this less primitive
end

#split_sentencesObject



14
15
16
# File 'lib/classifier/lsi/summary.rb', line 14

def split_sentences
   split /(\.|\!|\?)/ # TODO: make this less primitive
end

#summary(count = 10, separator = " [...] ") ⇒ Object



6
7
8
# File 'lib/classifier/lsi/summary.rb', line 6

def summary( count=10, separator=" [...] " )
   perform_lsi split_sentences, count, separator
end

#without_punctuationObject

Removes common punctuation symbols, returning a new string. E.g.,

"Hello (greeting's), with {braces} < >...?".without_punctuation
=> "Hello  greetings   with  braces         "


15
16
17
# File 'lib/classifier/extensions/word_hash.rb', line 15

def without_punctuation
  tr( ',?.!;:"@#$%^&*()_=+[]{}\|<>/`~', " " ) .tr( "'\-", "")
end

#word_hashObject

Return a Hash of strings => ints. Each word in the string is stemmed, interned, and indexes to its frequency in the document.



21
22
23
24
25
# File 'lib/classifier/extensions/word_hash.rb', line 21

def word_hash
	word_hash = clean_word_hash()
	symbol_hash = word_hash_for_symbols(gsub(/[\w]/," ").split)
	return word_hash.merge(symbol_hash)
end