entry of neilernst and 2 other users:
(0)
This publication has not been reviewed yet.
rating distribution
average user rating
?
The average rating is computed over all reviews. However, some of them may be invisible to you due to the visibility setting chosen by the reviewers.
On the resemblance and containment of documents
by:In: Compression and Complexity of Sequences Salerno, Italy:
IEEE Computer Society Press
(June 1997)
, p. 21--29.
Resources (URL, PDF, PS...)
Abstract
Given two documents A and B we define two mathematical notions: their
resemblance rA, B and their containment cA, B that seem to capture
well the informal notions of “roughly the same� and “roughly
contained.� The basic idea is to reduce these issues to set intersection
problems that can be easily evaluated by a process of random sampling
that can be done independently for each document. Furthermore, the
resemblance can be evaluated using a fixed size sample for each
document. This paper discusses the mathematical properties of these
measures and the efficient implementation of the sampling process
using Rabin 1981 fingerprints
Description
Not previously uploaded


publication