Class: ClassifierReborn::ContentNode

Inherits:
Object
  • Object
show all
Defined in:
lib/classifier-reborn/lsi/content_node.rb

Overview

This is an internal data structure class for the LSI node. Save for raw_vector_with, it should be fairly straightforward to understand. You should never have to use it directly.

Direct Known Subclasses

CachedContentNode

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(word_hash, *categories) ⇒ ContentNode

If text_proc is not specified, the source will be duck-typed via source.to_s



18
19
20
21
22
# File 'lib/classifier-reborn/lsi/content_node.rb', line 18

def initialize( word_hash, *categories )
  @categories = categories || []
  @word_hash = word_hash
  @lsi_norm, @lsi_vector = nil
end

Instance Attribute Details

#categoriesObject

Returns the value of attribute categories.



11
12
13
# File 'lib/classifier-reborn/lsi/content_node.rb', line 11

def categories
  @categories
end

#lsi_normObject

Returns the value of attribute lsi_norm.



11
12
13
# File 'lib/classifier-reborn/lsi/content_node.rb', line 11

def lsi_norm
  @lsi_norm
end

#lsi_vectorObject

Returns the value of attribute lsi_vector.



11
12
13
# File 'lib/classifier-reborn/lsi/content_node.rb', line 11

def lsi_vector
  @lsi_vector
end

#raw_normObject

Returns the value of attribute raw_norm.



11
12
13
# File 'lib/classifier-reborn/lsi/content_node.rb', line 11

def raw_norm
  @raw_norm
end

#raw_vectorObject

Returns the value of attribute raw_vector.



11
12
13
# File 'lib/classifier-reborn/lsi/content_node.rb', line 11

def raw_vector
  @raw_vector
end

#word_hashObject (readonly)

Returns the value of attribute word_hash.



15
16
17
# File 'lib/classifier-reborn/lsi/content_node.rb', line 15

def word_hash
  @word_hash
end

Instance Method Details

#raw_vector_with(word_list) ⇒ Object

Creates the raw vector out of word_hash using word_list as the key for mapping the vector space.



41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
# File 'lib/classifier-reborn/lsi/content_node.rb', line 41

def raw_vector_with( word_list )
  if $GSL
     vec = GSL::Vector.alloc(word_list.size)
  else
     vec = Array.new(word_list.size, 0)
  end

  @word_hash.each_key do |word|
    vec[word_list[word]] = @word_hash[word] if word_list[word]
  end

  # Perform the scaling transform and force floating point arithmetic
  if $GSL
    sum = 0.0
    vec.each {|v| sum += v }
    total_words = sum
  else
    total_words = vec.reduce(0, :+).to_f
  end

  total_unique_words = 0

  if $GSL
    vec.each { |word| total_unique_words += 1 if word != 0.0 }
  else
    total_unique_words = vec.count{ |word| word != 0 }
  end

  # Perform first-order association transform if this vector has more
  # then one word in it.
  if total_words > 1.0 && total_unique_words > 1
    weighted_total = 0.0
    # Cache calculations, this takes too long on large indexes
    cached_calcs = Hash.new { |hash, term|
      hash[term] = (( term / total_words ) * Math.log( term / total_words ))
    }

    vec.each do |term|
      weighted_total += cached_calcs[term] if term > 0.0
    end

    # Cache calculations, this takes too long on large indexes
    cached_calcs = Hash.new do |hash, val|
      hash[val] = Math.log( val + 1 ) / -weighted_total
    end

    vec.collect! { |val|
      cached_calcs[val]
    }
  end

  if $GSL
    @raw_norm   = vec.normalize
    @raw_vector = vec
  else
    @raw_norm   = Vector[*vec].normalize
    @raw_vector = Vector[*vec]
  end
end

#search_normObject

Use this to fetch the appropriate search vector in normalized form.



35
36
37
# File 'lib/classifier-reborn/lsi/content_node.rb', line 35

def search_norm
  @lsi_norm || @raw_norm
end

#search_vectorObject

Use this to fetch the appropriate search vector.



25
26
27
# File 'lib/classifier-reborn/lsi/content_node.rb', line 25

def search_vector
  @lsi_vector || @raw_vector
end

#transposed_search_vectorObject

Method to access the transposed search vector



30
31
32
# File 'lib/classifier-reborn/lsi/content_node.rb', line 30

def transposed_search_vector
  search_vector.col
end