sign in · help · news · about · deen

BibSonomy ::  publication ::

The blue social bookmark and publication sharing system.
entry of neilernst and 2 other users:    
(0)
This publication has not been reviewed yet.
rating distribution
average user rating
?
The average rating is computed over all reviews. However, some of them may be invisible to you due to the visibility setting chosen by the reviewers.
(0.0 of 5.0 based on 0 reviews)

On the resemblance and containment of documents

by: A. Z. Broder
In: Compression and Complexity of Sequences Salerno, Italy: IEEE Computer Society Press (June 1997) , p. 21--29.
Citation format (all formats):

Resources (URL, PDF, PS...)

Abstract

Given two documents A and B we define two mathematical notions: their resemblance rA, B and their containment cA, B that seem to capture well the informal notions of “roughly the same� and “roughly contained.� The basic idea is to reduce these issues to set intersection problems that can be easily evaluated by a process of random sampling that can be done independently for each document. Furthermore, the resemblance can be evaluated using a fixed size sample for each document. This paper discusses the mathematical properties of these measures and the efficient implementation of the sampling process using Rabin 1981 fingerprints

Description

Not previously uploaded

BibTeX record

Endnote record

a gripper