Class: Hoatzin::FeatureVector::Builder

Inherits:
Object
  • Object
show all
Defined in:
lib/feature_vector/builder.rb

Overview

A algebraic model for representing text documents as vectors of identifiers. A document is represented as a vector. Each dimension of the vector corresponds to a separate term. If a term occurs in the document, then the value in the vector is non-zero.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options = {}) ⇒ Builder

Returns a new instance of Builder.



12
13
14
15
16
# File 'lib/feature_vector/builder.rb', line 12

def initialize(options={})
  @parser = options.delete(:parser)
  @options = options
  @parsed_document_cache = []
end

Instance Attribute Details

#vector_keyword_indexObject

Returns the value of attribute vector_keyword_index.



10
11
12
# File 'lib/feature_vector/builder.rb', line 10

def vector_keyword_index
  @vector_keyword_index
end

Instance Method Details

#build_document_matrix(documents) ⇒ Object



18
19
20
21
22
23
24
25
# File 'lib/feature_vector/builder.rb', line 18

def build_document_matrix(documents)
  @vector_keyword_index = build_vector_keyword_index(documents)

  document_matrix = []
  document_matrix += documents.enum_for(:each_with_index).map{|document,document_id| build_vector(document, document_id)}
  
  Model.new(document_matrix, @vector_keyword_index)
end

#build_query_vector(text) ⇒ Object



27
28
29
# File 'lib/feature_vector/builder.rb', line 27

def build_query_vector(text)
  build_vector(text)
end

#marshal_dumpObject



31
32
33
# File 'lib/feature_vector/builder.rb', line 31

def marshal_dump
  [@parser, @options, @parsed_document_cache, @vector_keyword_index]
end

#marshal_load(ary) ⇒ Object



35
36
37
# File 'lib/feature_vector/builder.rb', line 35

def marshal_load(ary)
  @parser, @options, @parsed_document_cache, @vector_keyword_index = ary
end