Method: Classifier::LSI#highest_relative_content

Defined in:
lib/classifier/lsi.rb

#highest_relative_content(max_chunks = 10) ⇒ Object

This method returns max_chunks entries, ordered by their average semantic rating. Essentially, the average distance of each entry from all other entries is calculated, the highest are returned.

This can be used to build a summary service, or to provide more information about your dataset’s general content. For example, if you were to use categorize on the results of this data, you could gather information on what your dataset is generally about.



155
156
157
158
159
160
161
162
# File 'lib/classifier/lsi.rb', line 155

def highest_relative_content( max_chunks=10 )
   return [] if needs_rebuild?

   avg_density = Hash.new
   @items.each_key { |x| avg_density[x] = proximity_array_for_content(x).inject(0.0) { |x,y| x + y[1]} }

   avg_density.keys.sort_by { |x| avg_density[x] }.reverse[0..max_chunks-1].map
end