Class: TfIdfSimilarity::BM25Model

Inherits:
Model
  • Object
show all
Defined in:
lib/tf-idf-similarity/bm25_model.rb

Instance Method Summary collapse

Constructor Details

This class inherits a constructor from TfIdfSimilarity::Model

Instance Method Details

#inverse_document_frequency(term) ⇒ Float Also known as: idf

Return the term's inverse document frequency.

Parameters:

  • term (String)

    a term

Returns:

  • (Float)

    the term's inverse document frequency



11
12
13
14
# File 'lib/tf-idf-similarity/bm25_model.rb', line 11

def inverse_document_frequency(term)
  df = @model.document_count(term)
  log((documents.size - df + 0.5) / (df + 0.5))
end

#term_frequency(document, term) ⇒ Float Also known as: tf

Note:

Like Lucene, we use a b value of 0.75 and a k1 value of 1.2.

Returns the term's frequency in the document.

Parameters:

  • document (Document)

    a document

  • term (String)

    a term

Returns:

  • (Float)

    the term's frequency in the document



24
25
26
27
28
29
30
31
# File 'lib/tf-idf-similarity/bm25_model.rb', line 24

def term_frequency(document, term)
  if @model.average_document_size.zero?
    Float::NAN
  else
    tf = document.term_count(term)
    (tf * 2.2) / (tf + 0.3 + 0.9 * document.size / @model.average_document_size)
  end
end