Class: Reclassifier::ContentNode

Inherits:
Object
  • Object
show all
Defined in:
lib/reclassifier/content_node.rb

Overview

This is an internal data structure class for the LSI node. Save for raw_vector_with, it should be fairly straightforward to understand. You should never have to use it directly.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(word_hash, *categories) ⇒ ContentNode

If text_proc is not specified, the source will be duck-typed via source.to_s



14
15
16
17
# File 'lib/reclassifier/content_node.rb', line 14

def initialize( word_hash, *categories )
  @categories = categories || []
  @word_hash = word_hash
end

Instance Attribute Details

#categoriesObject

Returns the value of attribute categories.



7
8
9
# File 'lib/reclassifier/content_node.rb', line 7

def categories
  @categories
end

#lsi_normObject

Returns the value of attribute lsi_norm.



7
8
9
# File 'lib/reclassifier/content_node.rb', line 7

def lsi_norm
  @lsi_norm
end

#lsi_vectorObject

Returns the value of attribute lsi_vector.



7
8
9
# File 'lib/reclassifier/content_node.rb', line 7

def lsi_vector
  @lsi_vector
end

#raw_normObject

Returns the value of attribute raw_norm.



7
8
9
# File 'lib/reclassifier/content_node.rb', line 7

def raw_norm
  @raw_norm
end

#raw_vectorObject

Returns the value of attribute raw_vector.



7
8
9
# File 'lib/reclassifier/content_node.rb', line 7

def raw_vector
  @raw_vector
end

#word_hashObject (readonly)

Returns the value of attribute word_hash.



11
12
13
# File 'lib/reclassifier/content_node.rb', line 11

def word_hash
  @word_hash
end

Instance Method Details

#raw_vector_with(word_list) ⇒ Object

Creates the raw vector out of word_hash using word_list as the key for mapping the vector space.



31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# File 'lib/reclassifier/content_node.rb', line 31

def raw_vector_with( word_list )
  if $GSL
     vec = GSL::Vector.alloc(word_list.size)
  else
     vec = Array.new(word_list.size, 0)
  end

  @word_hash.each_key do |word|
    vec[word_list[word]] = @word_hash[word] if word_list[word]
  end

  # Perform the scaling transform
  total_words = $GSL ? vec.sum : vec.sum_with_identity

  # Perform first-order association transform if this vector has more
  # than one word in it.
  if total_words > 1.0
    weighted_total = 0.0
    vec.each do |term|
      if ( term > 0 )
        weighted_total += (( term / total_words ) * Math.log( term / total_words ))
      end
    end
    vec = vec.collect { |val| Math.log( val + 1 ) / -weighted_total }
  end

  if $GSL
     @raw_norm   = vec.normalize
     @raw_vector = vec
  else
     @raw_norm   = Vector[*vec].normalize
     @raw_vector = Vector[*vec]
  end
end

#search_normObject

Use this to fetch the appropriate search vector in normalized form.



25
26
27
# File 'lib/reclassifier/content_node.rb', line 25

def search_norm
  @lsi_norm || @raw_norm
end

#search_vectorObject

Use this to fetch the appropriate search vector.



20
21
22
# File 'lib/reclassifier/content_node.rb', line 20

def search_vector
  @lsi_vector || @raw_vector
end