Inproceedings,

GPU-based minwise hashing: GPU-based minwise hashing

P. Li, A. Shrivastava, and A. König.
Proceedings of the 21st World Wide Web Conference (WWW 2012) (Companion Volume), page 565-566. (2012)
DOI: 10.1145/2187980.2188129

Abstract

Minwise hashing is a standard technique for efficient set similarity estimation in the context of search. The recent work of b-bit minwise hashing provided a substantial improvement by storing only the lowest b bits of each hashed value. Both minwise hashing and b-bit minwise hashing require an expensive preprocessing step for applying k (e.g., k=500) permutations on the entire data in order to compute k minimal values as the hashed data. In this paper, we developed a parallelization scheme using GPUs, which reduced the processing time by a factor of 20-80. Reducing the preprocessing time is highly beneficial in practice, for example, for duplicate web page detection (where minwise hashing is a major step in the crawling pipeline) or for increasing the testing speed of online classifiers (when the test data are not preprocessed).

BibTeX key: LiSK12
entry type: inproceedings
booktitle: Proceedings of the 21st World Wide Web Conference (WWW 2012) (Companion Volume)
year: 2012
pages: 565-566
bibsource: DBLP, http://dblp.uni-trier.de
DOI: 10.1145/2187980.2188129
url: http://doi.acm.org/10.1145/2187980.2188129

BibSonomy

GPU-based minwise hashing: GPU-based minwise hashing

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on