Abstract

Link discovery plays a key role in the integration and use of data across RDF knowledge graphs. Active learning approaches are a common family of solutions to address the problem of learning how to compute links from users. So far, only active learning from perfect oracles has been considered in the literature. However, real oracles are often far from perfect (e.g., in crowdsourcing). We hence study the problem of learning how to compute links across knowledge graphs from noisy oracles, i.e., oracles that are not guaranteed to return correct classification results. We present a novel approach for link discovery based on a probabilistic model, with which we estimate the joint odds of the oracles' guesses. We combine this approach with an iterative learning approach based on refinements. The resulting method, Ligon, is evaluated on 10 benchmark datasets. Our results suggest that Ligon configured with 10 iterations and 10 training examples per iteration achieves more than 95% of the F-measure achieved by state-of-the-art algorithms trained with a perfect oracle. Moreover, Ligon outperforms batch learning approaches devised to be trained with small amounts of training data by more than 40% F-measure on average.

Links and resources

Tags