Incremental Diversification for Very Large Sets: a Streaming-based Approach
E. Minack, W. Siberski, and W. Nejdl. Proc. of 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, (2011)
Abstract
Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Even the greedy diversification algorithms usually exhibit quadratic complexity and require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.
%0 Conference Paper
%1 L3S_eccf70441b8a3fa9a76c655d9f626f2f99f22749
%A Minack, Enrico
%A Siberski, Wolf
%A Nejdl, Wolfgang
%B Proc. of 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011
%D 2011
%K Nejdl publication
%T Incremental Diversification for Very Large Sets: a Streaming-based Approach
%X Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Even the greedy diversification algorithms usually exhibit quadratic complexity and require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.
@inproceedings{L3S_eccf70441b8a3fa9a76c655d9f626f2f99f22749,
abstract = { Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Even the greedy diversification algorithms usually exhibit quadratic complexity and require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.},
added-at = {2012-06-15T13:14:42.000+0200},
author = {Minack, Enrico and Siberski, Wolf and Nejdl, Wolfgang},
biburl = {https://www.bibsonomy.org/bibtex/2007a1df578acb7f0a935ef7c478f56d6/l3s},
booktitle = {Proc. of 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011},
interhash = {14d6bfb1467b762f30ec68aeb07e6281},
intrahash = {007a1df578acb7f0a935ef7c478f56d6},
keywords = {Nejdl publication},
timestamp = {2012-06-15T13:14:48.000+0200},
title = {Incremental Diversification for Very Large Sets: a Streaming-based Approach},
year = 2011
}