Class: Basset::Document

Inherits:
Object
  • Object
show all
Defined in:
lib/basset/document.rb

Overview

A class for representing a document as a vector of features. It takes the text of the document and the classification. The vector of features representation is just a basic bag of words approach.

Direct Known Subclasses

DocumentOverrideExample, UriDocument

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(text, classification = nil) ⇒ Document

initialize the object with document text. Set an explicit classification to use the document as training data



14
15
16
17
# File 'lib/basset/document.rb', line 14

def initialize(text, classification = nil)
  @text, @classification = text, classification
  @tokens = stemmed_words
end

Instance Attribute Details

#classificationObject (readonly)

Returns the value of attribute classification.



9
10
11
# File 'lib/basset/document.rb', line 9

def classification
  @classification
end

#textObject (readonly)

Returns the value of attribute text.



9
10
11
# File 'lib/basset/document.rb', line 9

def text
  @text
end

Instance Method Details

#feature_vectorsObject

Alias for #vector_of_features



27
28
29
# File 'lib/basset/document.rb', line 27

def feature_vectors
  vector_of_features
end

#vector_of_featuresObject

returns an array of feature (token) vectors, which are instances Feature



21
22
23
# File 'lib/basset/document.rb', line 21

def vector_of_features
  @feature_vector ||= vector_of_features_from_terms_hash( terms_hash_from_words_array( @tokens ) )
end