@l3s

Incremental Diversification for Very Large Sets: a Streaming-based Approach

, , and . Proc. of 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, (2011)

Abstract

Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Even the greedy diversification algorithms usually exhibit quadratic complexity and require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.

Links and resources

Tags

community

  • @wolfsiberski
  • @dblp
  • @l3s
@l3s's tags highlighted