A Time-aware Random Walk Model for Finding Important Documents in Web Archives

, , , and . Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, page 915--918. New York, NY, USA, ACM, (2015)


Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, web archives are emerging as gold-mines for content analytics of many sorts. However, supporting search, which goes beyond navigational search via URLs, is a very challenging task in these unique structures with huge, redundant and noisy temporal content. In this paper, we address the search needs of expert users such as journalists, economists or historians for discovering a topic in time: Given a query, the top-k returned results should give the best representative documents that cover most interesting time-periods for the topic. For this purpose, we propose a novel random walk-based model that integrates relevance, temporal authority, diversity and time in a unified framework. Our preliminary experimental results on the large-scale real-world web archival collection shows that our method significantly improves the state-of-the-art algorithms (i.e., PageRank) in ranking temporal web pages.

Links and resources

BibTeX key:
search on:

Comments and Reviews  

There is no review or comment yet. You can write one!


Cite this publication