Some Common Mistakes In IR Evaluation, And How They Can Be Avoided

Abstract

This paper points out some mistakes that can be frequently found in IR publications: MRR and ERR violate basic requirements for a metric, MAP is based on unrealistic assumptions, the numbers shown overstate the precision of the result, relative improvements of arithmetic means are inappropriate, the simple holdout method yields unreliable results, hypotheses are often formulated after the experiment, significance tests frequently ignore the multiple comparisons problem, effect sizes are ignored, reproducibility of the experiments might be nearly impossible, and sometimes authors claim proof by experimentation.

BibTeX key: fuhr2018common
entry type: article
year: 2018
month: feb
journal: SIGIR Forum
number: 3
pages: 32--41
publisher: Association for Computing Machinery (ACM)
volume: 51
DOI: 10.1145/3190580.3190586
url: https://doi.org/10.1145%2F3190580.3190586

BibSonomy

Some Common Mistakes In IR Evaluation, And How They Can Be Avoided

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on