Learn how to use trec_eval to evaluate your information retrieval system

May 25, 2016

Tools

The trec_eval is a tool used to evaluate rankings, either documents or any other information that is sorted by relevance. The evaluation is based on two files: The first, known as "qrels" (query relevances) lists the relevance judgements for each query. The second contains the rankings of documents returned by your RI system. The evaluation is based on two files: The first, known as "qrels" (query relevances) lists the relevance judgements for each query. The second contains the rankings of documents returned by your RI system. More details about these files will be described below.

Download

Click here to visit the official website and download the lastest version link, which is currently in version 9.0. Once downloaded, extract files to a folder and type "make" in command line prompt to compile trec_eval sourcecode. After that, the trec_eval executable will be ready to use.

How to use

The command to run trec_eval has the following format:

$ ./trec_eval [-q] [-m measure] qrel_file results_file

trec_eval: is the name of the executable program.
-q: In addition to summary evaluation, give evaluation for each query/topic
qrel_file: path of the file with the list of relevant documents for each query
-m: shows only a specific measure ("-m all_trec" shows all measures, "-m official" is the default parameter which shows only main measures)
results_file: path of the file with the list of documents retrieved by your application

Note: trec_eval provides sample qrels and results files, which are respectively "test/qrels.test" and "test/results.test". You can test trec_eval command with them.

Viewing a specific measure

If you want to see only some specific measures, just use the -m parameter followed by measure name. Use period (".") to pass parameters to the measure and separate multiple parameters with a comma (","). The -m parameter should be passed once for each measure. For example, to show only the measures MAP and Precision at 5 and at 10 use the following command:

$ .\trec_eval -m map -m P.5,10 qrels_file results_file

Details about all available measures are given below.

The qrels file

This file contains a list of documents considered relevants for each query. This relevance judgement is made by human beings who manually select documents that should be retrieved when a particular query is executed. This file can be considered as the "correct answer" and the documents retrieved by your IR system should approximate the maximum to it. It has the following format:

query-id 0 document-id relevance

The field query-id is a alphanumeric sequence to identify the query, document-id is a alphanumeric sequence to identify the judged document and the field relevance is a number to indicate the relevance degree between the document and query (0 for non relevant and 1 for relevant). The second field "0" is not currently used, just put it in the file. The fields can be separated by a blank space or tabulation.

The results file

The results file contains a ranking of documents for each query automaticaly generated by your application. This is the file that will evaluated by trec_eval based in the "correct answer" provided by the first file. This file has the following format:

query-id Q0 document-id rank score STANDARD

The field query-id is a alphanumeric sequence to identify the query. The second field, with "Q0" value, is currently ignored by trec_eval, just put it in the file. The field document-id is a alphanumeric sequence to identify the retrieved document. The field rank is an integer value which represents the document position in the ranking, but this field is also ignored by trec_eval. The field score can be an integer or float value to indicate the similarity degree between document and query, so the most relevants docs will have higher scores. The last field, with "STANDARD" value, is used only to identify this run (this name is also showed in the output), you can use any alphanumeric sequence.

Measures of trec_eval

Running trec_eval with default options the following informations are outputted:

runid	Name of the run (is the name given on the last field of the results file)
num_q	Total number of evaluated queries
num_ret	Total number of retrieved documents
num_rel	Total number of relevant documents (according to the qrels file)
num_rel_ret	Total number of relevant documents retrieved (in the results file)
map	Mean average precision (map)
gm_map	Average precision. Geometric mean
Rprec	Precision of the first R documents, where R are the number os relevants
bpref	Binary preference
recip_rank	Reciprical Rank
iprec_at_recall_0.00	Interpolated Recall - Precision Averages at 0.00 recall
iprec_at_recall_0.10	Interpolated Recall - Precision Averages at 0.10 recall
iprec_at_recall_0.20	Interpolated Recall - Precision Averages at 0.20 recall
iprec_at_recall_0.30	Interpolated Recall - Precision Averages at 0.30 recall
iprec_at_recall_0.40	Interpolated Recall - Precision Averages at 0.40 recall
iprec_at_recall_0.50	Interpolated Recall - Precision Averages at 0.50 recall
iprec_at_recall_0.60	Interpolated Recall - Precision Averages at 0.60 recall
iprec_at_recall_0.70	Interpolated Recall - Precision Averages at 0.70 recall
iprec_at_recall_0.80	Interpolated Recall - Precision Averages at 0.80 recall
iprec_at_recall_0.90	Interpolated Recall - Precision Averages at 0.90 recall
iprec_at_recall_1.00	Interpolated Recall - Precision Averages at 1.00 recall
P_5	Precision of the 5 first documents
P_10	Precision of the 10 first documents
P_15	Precision of the 15 first documents
P_20	Precision of the 20 first documents
P_30	Precision of the 30 first documents
P_100	Precision of the 100 first documents
P_200	Precision of the 200 first documents
P_500	Precision of the 500 first documents
P_1000	Precision of the 1000 first documents

Rafael is a PhD student in computer science. Made his first website in 2001 without knowing anything about HTML or even knowing the meaning of FTP. Since then, everything he learned only was possible thanks to the knowledge shared on the internet by others.

He created this blog as a way to give back all learning received and to collaborate with the developers and research communities which uses a part of their precious time to share what they know.

Learn how to use trec_eval to evaluate your information retrieval system

Download

How to use

Viewing a specific measure

The qrels file

The results file

Measures of trec_eval

Recent Posts

Popular Posts

Categories

About Me