@brusilovsky

Smoothing Clickthrough Data for Web Search Ranking

, , , , and . Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, page 355--362. New York, NY, USA, ACM, (2009)
DOI: 10.1145/1571941.1572003

Abstract

Incorporating features extracted from clickthrough data (called clickthrough features) has been demonstrated to significantly improve the performance of ranking models for Web search applications. Such benefits, however, are severely limited by the data sparseness problem, i.e., many queries and documents have no or very few clicks. The ranker thus cannot rely strongly on clickthrough features for document ranking. This paper presents two smoothing methods to expand clickthrough data: query clustering via Random Walk on click graphs and a discounting method inspired by the Good-Turing estimator. Both methods are evaluated on real-world data in three Web search domains. Experimental results show that the ranking models trained on smoothed clickthrough features consistently outperform those trained on unsmoothed features. This study demonstrates both the importance and the benefits of dealing with the sparseness problem in clickthrough data.

Links and resources

Tags

community

  • @brusilovsky
  • @aho
  • @dblp
@brusilovsky's tags highlighted