What is Evalir?

Evalir is a library for evaluation of IR systems. It incorporates a number of standard measurements, from the basic precision and recall, to single value summaries such as NDCG and MAP.

For a good reference on the theory behind this, please check out Manning, Raghavan & Schützes excellent Introduction to Information Retrieval, ch.8.

What can Evalir do?

Precision
Recall
Precision at Recall (e.g. Precision at 20%)
Precision at rank k
Average Precision
Precision-Recall curve
Reciprocal Rank
Mean Reciprocal Rank
Mean Average Precision (MAP)
F-measure
R-Precision
Discounted Cumulative Gain (DCG)
Normalized DCG

How does Evalir work?

The goal of an Information Retrieval system is to provide the user with relevant information -- relevant w.r.t. the user's information need. For example, an information need might be:

Information on whether drinking red wine is more effective at reducing your risk of heart attacks than white wine.

However, this is not the query. A user will try to encode her need like a query, for instance:

red white wine reducing "heart attack"

To evaluate an IR system with Evalir, we will need human-annotated test data, each data point consisting of the following:

An explicit information need
A query
A list of documents that are relevant w.r.t. the information need (not the query)

For example, we have the aforementioned information need and query, and a list of documents that have been found to be relevant; { 123, 654, 29, 1029 }. If we had the actual query results in an array named results, we could use an Evalirator like this:

relevant = [123, 654, 29, 1029]
e = Evalir::Evalirator.new(relevant, results)
puts "Precision: #{e.precision}"
puts "Recall: #{e.recall}"
puts "F-1: #{e.f1}" 
puts "F-3: #{e.f_measure(3)}"
puts "Precision at rank 10: #{e.precision_at_rank(10)}"
puts "Average Precision: #{e.average_precision}"
puts "NDCG @ 5: #{e.ndcg_at(5)}"

When you have several information needs and want to compute aggregate statistics, use an EvaliratorCollection like this:

e = Evalir::EvaliratorCollection.new
queries.each do |query|
  relevant = get_relevant_docids(query)
  results = get_results(query)
  e.add(relevant, results)
end

puts "MAP: #{e.mean_average_precision}"
puts "Precision-Recall Curve: #{e.precision_recall_curve}"
puts "Avg. NDCG @ 3: #{e.average_ndcg_at(3)}"