Abstract
Comprehensive bibliographies often rely on community contributions. In such settings, de-duplication is mandatory for the bibliography to be useful. Ideally, de-duplication works online, i.e., when adding new references, so the bibliography remains duplicate-free at all times. While de-duplication is well researched, generic approaches do not achieve the result quality required for automated reconciliation. To overcome this problem, we propose a new duplicate detection and reconciliation technique called RefConcile. Aiming specifically at bibliographic references, it uses dedicated blocking and matching techniques tailored to this type of data. Our evaluation based on a large real-world collection of bibliographic references shows that RefConcile scales well, and that it detects and reconciles duplicates highly accurately.
Users
Please
log in to take part in the discussion (add own reviews or comments).