@beate

How does clickthrough data reflect retrieval quality?

, , and . CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge mining, page 43--52. New York, NY, USA, ACM, (2008)
DOI: 10.1145/1458082.1458092

Abstract

Automatically judging the quality of retrieval functions based on observable user behavior holds promise for making retrieval evaluation faster, cheaper, and more user centered. However, the relationship between observable user behavior and retrieval quality is not yet fully understood. We present a sequence of studies investigating this relationship for an operational search engine on the arXiv.org e-print archive. We find that none of the eight absolute usage metrics we explore (e.g., number of clicks, frequency of query reformulations, abandonment) reliably reflect retrieval quality for the sample sizes we consider. However, we find that paired experiment designs adapted from sensory analysis produce accurate and reliable statements about the relative quality of two retrieval functions. In particular, we investigate two paired comparison tests that analyze clickthrough data from an interleaved presentation of ranking pairs, and we find that both give accurate and consistent results. We conclude that both paired comparison tests give substantially more accurate and sensitive evaluation results than absolute usage metrics in our domain.

Links and resources

Tags

community

  • @chato
  • @beate
  • @dblp
@beate's tags highlighted