Unsupervised Discovery of Corroborative Paths for Fact Validation
Z. Syed, M. Röder, and A. Ngomo. The Semantic Web -- ISWC 2019, page 630--646. Cham, Springer International Publishing, (2019)
Any data publisher can make RDF knowledge graphs available for consumption on the Web. This is a direct consequence of the decentralized publishing paradigm underlying the Data Web, which has led to more than 150 billion facts on more than 3 billion things being published on the Web in more than 10,000 RDF knowledge graphs over the last decade. However, the success of this publishing paradigm also means that the validation of the facts contained in RDF knowledge graphs has become more important than ever before. Several families of fact validation algorithms have been developed over the last years to address several settings of the fact validation problems. In this paper, we consider the following fact validation setting: Given an RDF knowledge graph, compute the likelihood that a given (novel) fact is true. None of the current solutions to this problem exploits RDFS semantics---especially domain, range and class subsumption information. We address this research gap by presenting an unsupervised approach dubbed COPAAL, that extracts paths from knowledge graphs to corroborate (novel) input facts. Our approach relies on a mutual information measure that takes the RDFS semantics underlying the knowledge graph into consideration. In particular, we use the information shared by predicates and paths within the knowledge graph to compute the likelihood of a fact being corroborated by the knowledge graph. We evaluate our approach extensively using 17 publicly available datasets. Our results indicate that our approach outperforms the state of the art unsupervised approaches significantly by up to 0.15 AUC-ROC. We even outperform supervised approaches by up to 0.07 AUC-ROC. The source code of COPAAL is open-source and is available at https://github.com/dice-group/COPAAL.