Learn how to use trec_eval to evaluate your information retrieval system
The trec_eval is a tool used to evaluate rankings, either documents or any other information that is sorted by relevance. The evaluation is based on two files: The first, known as "qrels" (query relevances) lists the relevance judgements for each query. The second contains the rankings of documents returned by your RI system. The evaluation is based on two files: The first, known as "qrels" (query relevances) lists the relevance judgements for each query. The second contains the rankings of documents returned by your RI system. More details about these files will be described below.
Download
Click here to visit the official website and download the lastest version link, which is currently in version 9.0. Once downloaded, extract files to a folder and type "make" in command line prompt to compile trec_eval sourcecode. After that, the trec_eval executable will be ready to use.
How to use
The command to run trec_eval has the following format:
$ ./trec_eval [-q] [-m measure] qrel_file results_file
trec_eval: is the name of the executable program.
-q: In addition to summary evaluation, give evaluation for each query/topic
qrel_file: path of the file with the list of relevant documents for each query
-m: shows only a specific measure ("-m all_trec" shows all measures, "-m official" is the default parameter which shows only main measures)
results_file: path of the file with the list of documents retrieved by your application
Note: trec_eval provides sample qrels and results files, which are respectively "test/qrels.test" and "test/results.test". You can test trec_eval command with them.
Viewing a specific measure
If you want to see only some specific measures, just use the -m parameter followed by measure name. Use period (".") to pass parameters to the measure and separate multiple parameters with a comma (","). The -m parameter should be passed once for each measure. For example, to show only the measures MAP and Precision at 5 and at 10 use the following command:
$ .\trec_eval -m map -m P.5,10 qrels_file results_file
Details about all available measures are given below.
The qrels file
This file contains a list of documents considered relevants for each query. This relevance judgement is made by human beings who manually select documents that should be retrieved when a particular query is executed. This file can be considered as the "correct answer" and the documents retrieved by your IR system should approximate the maximum to it. It has the following format:
query-id 0 document-id relevance
The field query-id is a alphanumeric sequence to identify the query, document-id is a alphanumeric sequence to identify the judged document and the field relevance is a number to indicate the relevance degree between the document and query (0 for non relevant and 1 for relevant). The second field "0" is not currently used, just put it in the file. The fields can be separated by a blank space or tabulation.
The results file
The results file contains a ranking of documents for each query automaticaly generated by your application. This is the file that will evaluated by trec_eval based in the "correct answer" provided by the first file. This file has the following format:
query-id Q0 document-id rank score STANDARD
The field query-id is a alphanumeric sequence to identify the query. The second field, with "Q0" value, is currently ignored by trec_eval, just put it in the file. The field document-id is a alphanumeric sequence to identify the retrieved document. The field rank is an integer value which represents the document position in the ranking, but this field is also ignored by trec_eval. The field score can be an integer or float value to indicate the similarity degree between document and query, so the most relevants docs will have higher scores. The last field, with "STANDARD" value, is used only to identify this run (this name is also showed in the output), you can use any alphanumeric sequence.
Measures of trec_eval
Running trec_eval with default options the following informations are outputted:
runid | Name of the run (is the name given on the last field of the results file) |
num_q | Total number of evaluated queries |
num_ret | Total number of retrieved documents |
num_rel | Total number of relevant documents (according to the qrels file) |
num_rel_ret | Total number of relevant documents retrieved (in the results file) |
map | Mean average precision (map) |
gm_map | Average precision. Geometric mean |
Rprec | Precision of the first R documents, where R are the number os relevants |
bpref | Binary preference |
recip_rank | Reciprical Rank |
iprec_at_recall_0.00 | Interpolated Recall - Precision Averages at 0.00 recall |
iprec_at_recall_0.10 | Interpolated Recall - Precision Averages at 0.10 recall |
iprec_at_recall_0.20 | Interpolated Recall - Precision Averages at 0.20 recall |
iprec_at_recall_0.30 | Interpolated Recall - Precision Averages at 0.30 recall |
iprec_at_recall_0.40 | Interpolated Recall - Precision Averages at 0.40 recall |
iprec_at_recall_0.50 | Interpolated Recall - Precision Averages at 0.50 recall |
iprec_at_recall_0.60 | Interpolated Recall - Precision Averages at 0.60 recall |
iprec_at_recall_0.70 | Interpolated Recall - Precision Averages at 0.70 recall |
iprec_at_recall_0.80 | Interpolated Recall - Precision Averages at 0.80 recall |
iprec_at_recall_0.90 | Interpolated Recall - Precision Averages at 0.90 recall |
iprec_at_recall_1.00 | Interpolated Recall - Precision Averages at 1.00 recall |
P_5 | Precision of the 5 first documents |
P_10 | Precision of the 10 first documents |
P_15 | Precision of the 15 first documents |
P_20 | Precision of the 20 first documents |
P_30 | Precision of the 30 first documents |
P_100 | Precision of the 100 first documents |
P_200 | Precision of the 200 first documents |
P_500 | Precision of the 500 first documents |
P_1000 | Precision of the 1000 first documents |