Method: Classifier::LSI#highest_relative_content
- Defined in:
- lib/classifier/lsi.rb
#highest_relative_content(max_chunks = 10) ⇒ Object
This method returns max_chunks entries, ordered by their average semantic rating. Essentially, the average distance of each entry from all other entries is calculated, the highest are returned.
This can be used to build a summary service, or to provide more information about your dataset’s general content. For example, if you were to use categorize on the results of this data, you could gather information on what your dataset is generally about.
155 156 157 158 159 160 161 162 |
# File 'lib/classifier/lsi.rb', line 155 def highest_relative_content( max_chunks=10 ) return [] if needs_rebuild? avg_density = Hash.new @items.each_key { |x| avg_density[x] = proximity_array_for_content(x).inject(0.0) { |x,y| x + y[1]} } avg_density.keys.sort_by { |x| avg_density[x] }.reverse[0..max_chunks-1].map end |