Article,

Lattice-based progressive author disambiguation

, and .
Information Systems, (2022)
DOI: https://doi.org/10.1016/j.is.2022.102056

Abstract

Different use cases have acknowledged the importance of author identities and the non-triviality of determining them. Author disambiguation (AD) is a special case of entity resolution resolving author mentions to actual real-world authors. Like in other entity resolution tasks, AD methods are strongly restricted by scale and person name conventions. So far, this has been addressed by static blocking methods which cannot adapt to such collection-dependent properties. We address this gap by presenting the first progressive method of author disambiguation. Progressive entity resolution tackles large-scale conflation problems by repeatedly increasing the number of pairs compared for potential equivalence. Our method uses lattice structures to model name inclusion in an adaptive and more efficient way than traditional blocking techniques based on alphabetical order or fixed-level generalization. Our work offers additional insights into the relationship between name-matching, different blocking schemes, blocking and clustering as well as cost and benefit. Using the Web of Science as large-scale annotated test data, we observe and compare our model’s performance over time and compare it with various configurations and baselines. Our approach consistently outperforms state-of-the-art blocking methods, underlining its contribution to the field of author disambiguation. Our approach offers a novel alternative for tackling ambiguity in entity resolution, which is a major challenge for many information systems.

Tags

Users

  • @stumme
  • @dblp

Comments and Reviews