Class: RSemantic::VectorSpace::Builder

Inherits:
Object
  • Object
show all
Defined in:
lib/rsemantic/vector_space/builder.rb

Overview

A algebraic model for representing text documents as vectors of identifiers. A document is represented as a vector. Each dimension of the vector corresponds to a separate term. If a term occurs in the document, then the value in the vector is non-zero.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options = {}) ⇒ Builder

Returns a new instance of Builder.



9
10
11
12
13
14
15
# File 'lib/rsemantic/vector_space/builder.rb', line 9

def initialize(options = {})
  @parser = Parser.new(
    :filter_stop_words => options[:filter_stop_words],
    :locale => options[:locale]
  )
  @parsed_document_cache = []
end

Instance Attribute Details

#parsed_document_cacheObject (readonly)

Returns the value of attribute parsed_document_cache.



7
8
9
# File 'lib/rsemantic/vector_space/builder.rb', line 7

def parsed_document_cache
  @parsed_document_cache
end

Instance Method Details

#build_document_matrix(documents) ⇒ Object



17
18
19
20
21
22
23
24
25
26
27
28
29
# File 'lib/rsemantic/vector_space/builder.rb', line 17

def build_document_matrix(documents)
  @vector_keyword_index = build_vector_keyword_index(documents)

  document_vectors = documents.enum_for(:each_with_index).map{|document,document_id| build_vector(document, document_id)}

  n = document_vectors.size
  m = document_vectors.first.size

  # TODO check where else we use document_vectors and if we can directly use column based ones
  document_matrix = GSL::Matrix.alloc(*document_vectors.map {|v| v.transpose})

  Model.new(document_matrix, @vector_keyword_index)
end

#build_query_vector(term_list) ⇒ Object



31
32
33
# File 'lib/rsemantic/vector_space/builder.rb', line 31

def build_query_vector(term_list)
  build_vector(term_list.join(" "))
end