Method: Classifier::LSI#highest_relative_content

Defined in:
lib/classifier/lsi.rb

#highest_relative_content(max_chunks = 10) ⇒ Object

This method returns max_chunks entries, ordered by their average semantic rating. Essentially, the average distance of each entry from all other entries is calculated, the highest are returned.

This can be used to build a summary service, or to provide more information about your dataset’s general content. For example, if you were to use categorize on the results of this data, you could gather information on what your dataset is generally about.



151
152
153
154
155
156
157
158
# File 'lib/classifier/lsi.rb', line 151

def highest_relative_content(max_chunks = 10)
  return [] if needs_rebuild?

  avg_density = {}
  @items.each_key { |x| avg_density[x] = proximity_array_for_content(x).inject(0.0) { |x, y| x + y[1] } }

  avg_density.keys.sort_by { |x| avg_density[x] }.reverse[0..max_chunks - 1].map
end