copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Duplicate Record Detection: A Survey

A. Elmagarmid, P. Ipeirotis, and V. Verykios. IEEE Trans. Knowl. Data Eng., 19 (1): 1--16 (January 2007)
DOI: 10.1109/TKDE.2007.9

Abstract

Often, in the real world, entities have two or more representations in databases.Duplicate records do not share a common key and/or they contain errors that makeduplicate matching a difficult task. Errors are introduced as the result of transcriptionerrors, incomplete information, lack of standard formats or any combination of thesefactors. In this article, we present a thorough analysis of the literature on duplicaterecord detection. We cover similarity metrics that are commonly used to detect similarfield entries, and we present an extensive set of duplicate detection algorithms thatcan detect approximately duplicate records in a database. We also cover multiple techniques for improving the efficiency and scalability of approximate duplicate detectionalgorithms. We conclude with a coverage of existing tools and with a brief discussionof the big open problems in the area.

Links and resources

BibTeX key: Elmagarmid2007
entry type: article
address: Los Alamitos, CA, USA
year: 2007
month: jan
journal: IEEE Trans. Knowl. Data Eng.
number: 1
pages: 1--16
publisher: IEEE INSTITUTE OF ELECTRICAL AND ELECTRONICS
volume: 19
issn: 1041-4347
file: :Users/julien.gaugaz/Dropbox/Papers/Mendeley Desktop/2007/Elmagarmid, Ipeirotis, Verykios - 2007 - Duplicate Record Detection A Survey.pdf:pdf
DOI: 10.1109/TKDE.2007.9
url: http://doi.ieeecomputersociety.org/10.1109/TKDE.2007.9

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Duplicate Record Detection: A Survey

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Duplicate Record Detection: A Survey

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Duplicate Record Detection: A Survey

Comments and Reviews
(0)