Class: Basset::UriDocument
Overview
Subclass of Document intended to be used to classify URIs
Instance Attribute Summary
Attributes inherited from Document
Instance Method Summary collapse
-
#initialize(uri, classification = nil) ⇒ UriDocument
constructor
A new instance of UriDocument.
- #uri_tokens ⇒ Object
- #vector_of_features ⇒ Object
Methods inherited from Document
Constructor Details
#initialize(uri, classification = nil) ⇒ UriDocument
Returns a new instance of UriDocument.
69 70 71 72 |
# File 'lib/basset/document.rb', line 69 def initialize(uri, classification=nil) @text, @classification = uri, classification @tokens = uri_tokens end |
Instance Method Details
#uri_tokens ⇒ Object
78 79 80 |
# File 'lib/basset/document.rb', line 78 def uri_tokens URI.decode(@text).gsub(/(\&|\?|\\\\|\\|\/\/|\/|\=|\[|\]|\.\.|\.)/) { |char| " " + char + " " }.split end |
#vector_of_features ⇒ Object
74 75 76 |
# File 'lib/basset/document.rb', line 74 def vector_of_features @feature_vector ||= vector_of_features_from_terms_hash(terms_hash_from_words_array(@tokens)) end |