Class: String

Inherits:
Object show all
Defined in:
lib/classifier/lsi/summary.rb,
lib/classifier/extensions/word_hash.rb

Overview

Author

Lucas Carlson ([email protected])

Copyright

Copyright © 2005 Lucas Carlson

License

LGPL

Instance Method Summary collapse

Instance Method Details

#clean_word_hashObject

Return a word hash without extra punctuation or short symbols, just stemmed words



26
27
28
# File 'lib/classifier/extensions/word_hash.rb', line 26

def clean_word_hash
	word_hash_for_words gsub(/[^\w\s]/,"").split
end

#paragraph_summary(count = 1, separator = " [...] ") ⇒ Object



10
11
12
# File 'lib/classifier/lsi/summary.rb', line 10

def paragraph_summary( count=1, separator=" [...] " )
   perform_lsi split_paragraphs, count, separator
end

#split_paragraphsObject



18
19
20
# File 'lib/classifier/lsi/summary.rb', line 18

def split_paragraphs
   split /(\n\n|\r\r|\r\n\r\n)/ # TODO: make this less primitive
end

#split_sentencesObject



14
15
16
# File 'lib/classifier/lsi/summary.rb', line 14

def split_sentences
   split /(\.|\!|\?)/ # TODO: make this less primitive
end

#summary(count = 10, separator = " [...] ") ⇒ Object



6
7
8
# File 'lib/classifier/lsi/summary.rb', line 6

def summary( count=10, separator=" [...] " )
   perform_lsi split_sentences, count, separator
end

#without_punctuationObject

Removes common punctuation symbols, returning a new string. E.g.,

"Hello (greeting's), with {braces} < >...?".without_punctuation
=> "Hello  greetings   with  braces         "


15
16
17
# File 'lib/classifier/extensions/word_hash.rb', line 15

def without_punctuation
  tr( ',?.!;:"@#$%^&*()_=+[]{}\|<>/`~', " " ) .tr( "'\-", "")
end

#word_hash(stemmer) ⇒ Object

Return a Hash of strings => ints. Each word in the string is stemmed, interned, and indexes to its frequency in the document.



21
22
23
# File 'lib/classifier/extensions/word_hash.rb', line 21

def word_hash(stemmer)
	word_hash_for_words(gsub(/[^\w\s]/,"").split + gsub(/[\w]/," ").split, stemmer)
end