Module: ActsAsFerret::MoreLikeThis::InstanceMethods

Defined in:
lib/more_like_this.rb

Instance Method Summary collapse

Instance Method Details

#more_like_this(options = {}, ferret_options = {}, ar_options = {}) ⇒ Object

returns other instances of this class, which have similar contents like this one. Basically works like this: find out n most interesting (i.e. characteristic) terms from this document, and then build a query from those which is run against the whole index. Which terms are interesting is decided on variour criteria which can be influenced by the given options.

The algorithm used here is a quite straight port of the MoreLikeThis class from Apache Lucene.

options are: :field_names : Array of field names to use for similarity search (mandatory) :min_term_freq => 2, # Ignore terms with less than this frequency in the source doc. :min_doc_freq => 5, # Ignore words which do not occur in at least this many docs :min_word_length => nil, # Ignore words shorter than this length (longer words tend to

be more characteristic for the document they occur in).

:max_word_length => nil, # Ignore words if greater than this len. :max_query_terms => 25, # maximum number of terms in the query built :max_num_tokens => 5000, # maximum number of tokens to examine in a single field :boost => false, # when true, a boost according to the relative score of

a term is applied to this Term's TermQuery.

:similarity => ‘DefaultAAFSimilarity’ # the similarity implementation to use (the default

equals Ferret's internal similarity implementation)

:analyzer => ‘Ferret::Analysis::StandardAnalyzer’ # class name of the analyzer to use :append_to_query => nil # proc taking a query object as argument, which will be called after generating the query. can be used to further manipulate the query used to find related documents, i.e. to constrain the search to a given class in single table inheritance scenarios ferret_options : Ferret options handed over to find_with_ferret (i.e. for limits and sorting) ar_options : options handed over to find_with_ferret for AR scoping



34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# File 'lib/more_like_this.rb', line 34

def more_like_this(options = {}, ferret_options = {}, ar_options = {})
  options = {
    :field_names => nil,  # Default field names
    :min_term_freq => 2,  # Ignore terms with less than this frequency in the source doc.
    :min_doc_freq => 5,   # Ignore words which do not occur in at least this many docs
    :min_word_length => 0, # Ignore words if less than this len. Default is not to ignore any words.
    :max_word_length => 0, # Ignore words if greater than this len. Default is not to ignore any words.
    :max_query_terms => 25,  # maximum number of terms in the query built
    :max_num_tokens => 5000, # maximum number of tokens to analyze when analyzing contents
    :boost => false,      
    :similarity => 'ActsAsFerret::MoreLikeThis::DefaultAAFSimilarity',  # class name of the similarity implementation to use
    :analyzer => 'Ferret::Analysis::StandardAnalyzer', # class name of the analyzer to use
    :append_to_query => nil,
    :base_class => self.class # base class to use for querying, useful in STI scenarios where BaseClass.find_with_ferret can be used to retrieve results from other classes, too
  }.update(options)
  #index.search_each('id:*') do |doc, score|
  #  puts "#{doc} == #{index[doc][:description]}"
  #end
  clazz = options[:base_class]
  options[:base_class] = clazz.name
  query = clazz.aaf_index.build_more_like_this_query(self.ferret_key, self.id, options)
  options[:append_to_query].call(query) if options[:append_to_query]
  clazz.find_with_ferret(query, ferret_options, ar_options)
end