Abstract
An increasing number of heterogeneous datasets abiding by the
Linked Data paradigm is published everyday. Discovering links
between these datasets is thus central to achieving the vision behind the Data Web. Declarative Link Discovery (LD) frameworks
rely on complex Link Specification (LS) to express the conditions
under which two resources should be linked. Complex LS combine similarity measures with thresholds to determine whether
a given predicate holds between two resources. State of the art
LD frameworks rely mostly on string-based similarity measures
such as Levenshtein and Jaccard. However, string-based similarity measures often fail to catch the similarity of resources with
phonetically similar property values when these property values
are represented using different string representation (e.g., names
and street labels). In this paper, we evaluate the impact of using
phonetics-based similarities in the process of LD.
Moreover, we evaluate the impact of phonetic-based similarity
measures on a state-of-the-art machine learning approach used
to generate LS. Our experiments suggest that the combination
of string-based and phonetic-based measures can improve the Fmeasures achieved by LD frameworks on most datasets.
Users
Please
log in to take part in the discussion (add own reviews or comments).