Class: Basset::UriDocument

Inherits:
Document show all
Defined in:
lib/basset/document.rb

Overview

Subclass of Document intended to be used to classify URIs

Instance Attribute Summary

Attributes inherited from Document

#classification, #text

Instance Method Summary collapse

Methods inherited from Document

#feature_vectors

Constructor Details

#initialize(uri, classification = nil) ⇒ UriDocument

Returns a new instance of UriDocument.



69
70
71
72
# File 'lib/basset/document.rb', line 69

def initialize(uri, classification=nil)
  @text, @classification = uri, classification
  @tokens = uri_tokens
end

Instance Method Details

#uri_tokensObject



78
79
80
# File 'lib/basset/document.rb', line 78

def uri_tokens
  URI.decode(@text).gsub(/(\&|\?|\\\\|\\|\/\/|\/|\=|\[|\]|\.\.|\.)/) { |char| " " + char + " " }.split
end

#vector_of_featuresObject



74
75
76
# File 'lib/basset/document.rb', line 74

def vector_of_features
  @feature_vector ||= vector_of_features_from_terms_hash(terms_hash_from_words_array(@tokens))
end