Class: Basset::Document
- Inherits:
-
Object
- Object
- Basset::Document
- Defined in:
- lib/basset/document.rb
Overview
A class for representing a document as a vector of features. It takes the text of the document and the classification. The vector of features representation is just a basic bag of words approach.
Direct Known Subclasses
Instance Attribute Summary collapse
-
#classification ⇒ Object
readonly
Returns the value of attribute classification.
-
#text ⇒ Object
readonly
Returns the value of attribute text.
Instance Method Summary collapse
-
#feature_vectors ⇒ Object
Alias for #vector_of_features.
-
#initialize(text, classification = nil) ⇒ Document
constructor
initialize the object with document text.
-
#vector_of_features ⇒ Object
returns an array of feature (token) vectors, which are instances Feature.
Constructor Details
#initialize(text, classification = nil) ⇒ Document
initialize the object with document text. Set an explicit classification to use the document as training data
14 15 16 17 |
# File 'lib/basset/document.rb', line 14 def initialize(text, classification = nil) @text, @classification = text, classification @tokens = stemmed_words end |
Instance Attribute Details
#classification ⇒ Object (readonly)
Returns the value of attribute classification.
9 10 11 |
# File 'lib/basset/document.rb', line 9 def classification @classification end |
#text ⇒ Object (readonly)
Returns the value of attribute text.
9 10 11 |
# File 'lib/basset/document.rb', line 9 def text @text end |
Instance Method Details
#feature_vectors ⇒ Object
Alias for #vector_of_features
27 28 29 |
# File 'lib/basset/document.rb', line 27 def feature_vectors vector_of_features end |
#vector_of_features ⇒ Object
returns an array of feature (token) vectors, which are instances Feature
21 22 23 |
# File 'lib/basset/document.rb', line 21 def vector_of_features @feature_vector ||= vector_of_features_from_terms_hash( terms_hash_from_words_array( @tokens ) ) end |