Abstract

Many of the available RDF datasets describe millions of resources by using billions of triples. Consequently, millions of links can potentially exist among such datasets. While parallel implementations of link discovery approaches have been developed in the past, load balancing approaches for local implementations of link discovery algorithms have been paid little attention to. In this paper, we thus present a novel load balancing technique for link discovery on parallel hardware based on particle-swarm optimization. We combine this approach with the Orchid algorithm for geo-spatial linking and evaluate it on real and artificial datasets. Our evaluation suggests that while na\"ıve approaches can be super-linear on small data sets, our deterministic particle swarm optimization outperforms both na\"ıve and classical load balancing approaches such as greedy load balancing on large datasets.

Links and resources

Tags

community

  • @dice-research
  • @aksw
@dice-research's tags highlighted