Rafael Glater

Learn how to use trec_eval to evaluate your information retrieval system

The trec_eval is a tool used to evaluate rankings, either documents or any other information that is sorted by relevance. The evaluation is based on two files: The first, known as "qrels" (query relevances) lists the relevance judgements for each query. The second contains the rankings of documents returned by your RI system. The evaluation is based on two files: The first, known as "qrels" (query relevances) lists the relevance judgements for each query. The second contains the rankings of documents returned by your RI system. More details about these files will be described below.

Download

Click here to visit the official website and download the lastest version link, which is currently in version 9.0. Once downloaded, extract files to a folder and type "make" in command line prompt to compile trec_eval sourcecode. After that, the trec_eval executable will be ready to use.

How to use

The command to run trec_eval has the following format:

$ ./trec_eval [-q] [-m measure] qrel_file results_file

 

trec_eval: is the name of the executable program.
-q: In addition to summary evaluation, give evaluation for each query/topic
qrel_file: path of the file with the list of relevant documents for each query
-m: shows only a specific measure ("-m all_trec" shows all measures, "-m official" is the default parameter which shows only main measures)
results_file: path of the file with the list of documents retrieved by your application

Note: trec_eval provides sample qrels and results files, which are respectively "test/qrels.test" and "test/results.test". You can test trec_eval command with them.

Viewing a specific measure

If you want to see only some specific measures, just use the -m parameter followed by measure name. Use period (".") to pass parameters to the measure and separate multiple parameters with a comma (","). The -m parameter should be passed once for each measure. For example, to show only the measures MAP and Precision at 5 and at 10 use the following command:

$ .\trec_eval -m map -m P.5,10 qrels_file results_file

 

Details about all available measures are given below.

The qrels file

This file contains a list of documents considered relevants for each query. This relevance judgement is made by human beings who manually select documents that should be retrieved when a particular query is executed. This file can be considered as the "correct answer" and the documents retrieved by your IR system should approximate the maximum to it. It has the following format:


query-id 0 document-id relevance


The field query-id is a alphanumeric sequence to identify the query, document-id is a alphanumeric sequence to identify the judged document and the field relevance is a number to indicate the relevance degree between the document and query (0 for non relevant and 1 for relevant). The second field "0" is not currently used, just put it in the file. The fields can be separated by a blank space or tabulation.

The results file

The results file contains a ranking of documents for each query automaticaly generated by your application. This is the file that will evaluated by trec_eval based in the "correct answer" provided by the first file. This file has the following format:


query-id Q0 document-id rank score STANDARD


The field query-id is a alphanumeric sequence to identify the query. The second field, with "Q0" value, is currently ignored by trec_eval, just put it in the file. The field document-id is a alphanumeric sequence to identify the retrieved document. The field rank is an integer value which represents the document position in the ranking, but this field is also ignored by trec_eval. The field score can be an integer or float value to indicate the similarity degree between document and query, so the most relevants docs will have higher scores. The last field, with "STANDARD" value, is used only to identify this run (this name is also showed in the output), you can use any alphanumeric sequence.


Measures of trec_eval

Running trec_eval with default options the following informations are outputted:

runid Name of the run (is the name given on the last field of the results file)
num_q Total number of evaluated queries
num_ret Total number of retrieved documents
num_rel Total number of relevant documents (according to the qrels file)
num_rel_ret Total number of relevant documents retrieved (in the results file)
map Mean average precision (map)
gm_map Average precision. Geometric mean
Rprec Precision of the first R documents, where R are the number os relevants
bpref Binary preference
recip_rank Reciprical Rank
iprec_at_recall_0.00 Interpolated Recall - Precision Averages at 0.00 recall
iprec_at_recall_0.10 Interpolated Recall - Precision Averages at 0.10 recall
iprec_at_recall_0.20 Interpolated Recall - Precision Averages at 0.20 recall
iprec_at_recall_0.30 Interpolated Recall - Precision Averages at 0.30 recall
iprec_at_recall_0.40 Interpolated Recall - Precision Averages at 0.40 recall
iprec_at_recall_0.50 Interpolated Recall - Precision Averages at 0.50 recall
iprec_at_recall_0.60 Interpolated Recall - Precision Averages at 0.60 recall
iprec_at_recall_0.70 Interpolated Recall - Precision Averages at 0.70 recall
iprec_at_recall_0.80 Interpolated Recall - Precision Averages at 0.80 recall
iprec_at_recall_0.90 Interpolated Recall - Precision Averages at 0.90 recall
iprec_at_recall_1.00 Interpolated Recall - Precision Averages at 1.00 recall
P_5 Precision of the 5 first documents
P_10 Precision of the 10 first documents
P_15 Precision of the 15 first documents
P_20 Precision of the 20 first documents
P_30 Precision of the 30 first documents
P_100 Precision of the 100 first documents
P_200 Precision of the 200 first documents
P_500 Precision of the 500 first documents
P_1000 Precision of the 1000 first documents