Class: Classifier::ContentNode

Inherits:
Object
  • Object
show all
Defined in:
lib/classifier/lsi/content_node.rb

Overview

This is an internal data structure class for the LSI node. Save for raw_vector_with, it should be fairly straightforward to understand. You should never have to use it directly.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(word_frequencies, *categories) ⇒ ContentNode

If text_proc is not specified, the source will be duck-typed via source.to_s



29
30
31
32
# File 'lib/classifier/lsi/content_node.rb', line 29

def initialize(word_frequencies, *categories)
  @categories = categories || []
  @word_hash = word_frequencies
end

Instance Attribute Details

#categoriesObject

Returns the value of attribute categories.



21
22
23
# File 'lib/classifier/lsi/content_node.rb', line 21

def categories
  @categories
end

#lsi_normObject

Returns the value of attribute lsi_norm.



18
19
20
# File 'lib/classifier/lsi/content_node.rb', line 18

def lsi_norm
  @lsi_norm
end

#lsi_vectorObject

Returns the value of attribute lsi_vector.



18
19
20
# File 'lib/classifier/lsi/content_node.rb', line 18

def lsi_vector
  @lsi_vector
end

#raw_normObject

Returns the value of attribute raw_norm.



18
19
20
# File 'lib/classifier/lsi/content_node.rb', line 18

def raw_norm
  @raw_norm
end

#raw_vectorObject

Returns the value of attribute raw_vector.



18
19
20
# File 'lib/classifier/lsi/content_node.rb', line 18

def raw_vector
  @raw_vector
end

#word_hashObject (readonly)

Returns the value of attribute word_hash.



23
24
25
# File 'lib/classifier/lsi/content_node.rb', line 23

def word_hash
  @word_hash
end

Instance Method Details

#raw_vector_with(word_list) ⇒ Object

Creates the raw vector out of word_hash using word_list as the key for mapping the vector space.



52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# File 'lib/classifier/lsi/content_node.rb', line 52

def raw_vector_with(word_list)
  vec = if Classifier::LSI.gsl_available
          GSL::Vector.alloc(word_list.size)
        else
          Array.new(word_list.size, 0)
        end

  @word_hash.each_key do |word|
    vec[word_list[word]] = @word_hash[word] if word_list[word]
  end

  # Perform the scaling transform
  total_words = Classifier::LSI.gsl_available ? vec.sum : vec.sum_with_identity
  vec_array = Classifier::LSI.gsl_available ? vec.to_a : vec
  total_unique_words = vec_array.count { |word| word != 0 }

  # Perform first-order association transform if this vector has more
  # than one word in it.
  if total_words > 1.0 && total_unique_words > 1
    weighted_total = 0.0

    vec.each do |term|
      next unless term.positive?
      next if total_words.zero?

      term_over_total = term / total_words
      val = term_over_total * Math.log(term_over_total)
      weighted_total += val unless val.nan?
    end

    sign = weighted_total.negative? ? 1.0 : -1.0
    divisor = sign * [weighted_total.abs, Vector::EPSILON].max
    vec = vec.collect { |val| Math.log(val + 1) / divisor }
  end

  if Classifier::LSI.gsl_available
    @raw_norm   = vec.normalize
    @raw_vector = vec
  else
    @raw_norm   = Vector[*vec].normalize
    @raw_vector = Vector[*vec]
  end
end

#search_normObject

Use this to fetch the appropriate search vector in normalized form.



44
45
46
# File 'lib/classifier/lsi/content_node.rb', line 44

def search_norm
  @lsi_norm || @raw_norm
end

#search_vectorObject

Use this to fetch the appropriate search vector.



37
38
39
# File 'lib/classifier/lsi/content_node.rb', line 37

def search_vector
  @lsi_vector || @raw_vector
end