Class: ClassifierReborn::ContentNode

Inherits:
Object
  • Object
show all
Defined in:
lib/classifier-reborn/lsi/content_node.rb

Overview

This is an internal data structure class for the LSI node. Save for raw_vector_with, it should be fairly straightforward to understand. You should never have to use it directly.

Direct Known Subclasses

CachedContentNode

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(word_hash, *categories) ⇒ ContentNode

If text_proc is not specified, the source will be duck-typed via source.to_s



19
20
21
22
23
# File 'lib/classifier-reborn/lsi/content_node.rb', line 19

def initialize(word_hash, *categories)
  @categories = categories || []
  @word_hash = word_hash
  @lsi_norm, @lsi_vector = nil
end

Instance Attribute Details

#categoriesObject

Returns the value of attribute categories.



12
13
14
# File 'lib/classifier-reborn/lsi/content_node.rb', line 12

def categories
  @categories
end

#lsi_normObject

Returns the value of attribute lsi_norm.



12
13
14
# File 'lib/classifier-reborn/lsi/content_node.rb', line 12

def lsi_norm
  @lsi_norm
end

#lsi_vectorObject

Returns the value of attribute lsi_vector.



12
13
14
# File 'lib/classifier-reborn/lsi/content_node.rb', line 12

def lsi_vector
  @lsi_vector
end

#raw_normObject

Returns the value of attribute raw_norm.



12
13
14
# File 'lib/classifier-reborn/lsi/content_node.rb', line 12

def raw_norm
  @raw_norm
end

#raw_vectorObject

Returns the value of attribute raw_vector.



12
13
14
# File 'lib/classifier-reborn/lsi/content_node.rb', line 12

def raw_vector
  @raw_vector
end

#word_hashObject (readonly)

Returns the value of attribute word_hash.



16
17
18
# File 'lib/classifier-reborn/lsi/content_node.rb', line 16

def word_hash
  @word_hash
end

Instance Method Details

#raw_vector_with(word_list) ⇒ Object

Creates the raw vector out of word_hash using word_list as the key for mapping the vector space.



46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
# File 'lib/classifier-reborn/lsi/content_node.rb', line 46

def raw_vector_with(word_list)
  vec = if $SVD == :numo
          Numo::DFloat.zeros(word_list.size)
        elsif $SVD == :gsl
          GSL::Vector.alloc(word_list.size)
        else
          Array.new(word_list.size, 0)
        end

  @word_hash.each_key do |word|
    vec[word_list[word]] = @word_hash[word] if word_list[word]
  end

  # Perform the scaling transform and force floating point arithmetic
  if $SVD == :numo
    total_words = vec.sum.to_f
  elsif $SVD == :gsl
    sum = 0.0
    vec.each { |v| sum += v }
    total_words = sum
  else
    total_words = vec.reduce(0, :+).to_f
  end

  total_unique_words = 0

  if [:numo, :gsl].include?($SVD)
    vec.each { |word| total_unique_words += 1 if word != 0.0 }
  else
    total_unique_words = vec.count { |word| word != 0 }
  end

  # Perform first-order association transform if this vector has more
  # then one word in it.
  if total_words > 1.0 && total_unique_words > 1
    weighted_total = 0.0
    # Cache calculations, this takes too long on large indexes
    cached_calcs = Hash.new do |hash, term|
      hash[term] = ((term / total_words) * Math.log(term / total_words))
    end

    vec.each do |term|
      weighted_total += cached_calcs[term] if term > 0.0
    end

    # Cache calculations, this takes too long on large indexes
    cached_calcs = Hash.new do |hash, val|
      hash[val] = Math.log(val + 1) / -weighted_total
    end

    vec = vec.map do |val|
      cached_calcs[val]
    end
  end

  if $SVD == :numo
    @raw_norm   = vec / Numo::Linalg.norm(vec)
    @raw_vector = vec
  elsif $SVD == :gsl
    @raw_norm   = vec.normalize
    @raw_vector = vec
  else
    @raw_norm   = Vector[*vec].normalize
    @raw_vector = Vector[*vec]
  end
end

#search_normObject

Use this to fetch the appropriate search vector in normalized form.



40
41
42
# File 'lib/classifier-reborn/lsi/content_node.rb', line 40

def search_norm
  @lsi_norm || @raw_norm
end

#search_vectorObject

Use this to fetch the appropriate search vector.



26
27
28
# File 'lib/classifier-reborn/lsi/content_node.rb', line 26

def search_vector
  @lsi_vector || @raw_vector
end

#transposed_search_vectorObject

Method to access the transposed search vector



31
32
33
34
35
36
37
# File 'lib/classifier-reborn/lsi/content_node.rb', line 31

def transposed_search_vector
  if $SVD == :numo
    search_vector
  else
    search_vector.col
  end
end