@cscholz

Scaling up all pairs similarity search

, , and . WWW '07: Proceedings of the 16th international conference on World Wide Web, page 131--140. New York, NY, USA, ACM, (2007)
DOI: 10.1145/1242572.1242591

Abstract

Given a large collection of sparse vector data in a high dimensional space, we investigate the problem of finding all pairs of vectors whose similarity score (as determined by a function such as cosine distance) is above a given threshold. We propose a simple algorithm based on novel indexing and optimization strategies that solves this problem without relying on approximation methods or extensive parameter tuning. We show the approach efficiently handles a variety of datasets across a wide setting of similarity thresholds, with large speedups over previous state-of-the-art approaches.

Description

BibSonomy :: bibtex :: Scaling up all pairs similarity search

Links and resources

Tags

community

  • @cscholz
  • @chato
  • @dblp
  • @tgunkel
@cscholz's tags highlighted