Class: String

Inherits:
Object show all
Defined in:
lib/simple_classifier/extensions/word_hash.rb

Overview

These are extensions to the String class to provide convenience methods for the Classifier package.

Instance Method Summary collapse

Instance Method Details

#clean_word_hashObject

Return a word hash without extra punctuation or short symbols, just stemmed words



24
25
26
# File 'lib/simple_classifier/extensions/word_hash.rb', line 24

def clean_word_hash
  word_hash_for_words gsub(/[^\w\s]/,"").split
end

#without_punctuationObject

Removes common punctuation symbols, returning a new string. E.g.,

"Hello (greeting's), with {braces} < >...?".without_punctuation
=> "Hello  greetings   with  braces         "


13
14
15
# File 'lib/simple_classifier/extensions/word_hash.rb', line 13

def without_punctuation
  tr( ',?.!;:"@#$%^&*()_=+[]{}\|<>/`~', " " ) .tr( "'\-", "")
end

#word_hashObject

Return a Hash of strings => ints. Each word in the string is stemmed, interned, and indexes to its frequency in the document.



19
20
21
# File 'lib/simple_classifier/extensions/word_hash.rb', line 19

def word_hash
  word_hash_for_words(gsub(/[^\w\s]/,"").split + gsub(/[\w]/," ").split)
end