Abstract
Semantic inferencing and querying across large scale RDF triple stores is notoriously slow. Our objective is to expedite this process by employ- ing Google’s MapReduce framework to implement scale-out distributed query- ing and reasoning. This approach requires RDF graphs to be decomposed into smaller units that are distributed across computational nodes. RDF Molecules appear to offer an ideal approach – providing an intermediate level of granulari- ty between RDF graphs and triples. However, the original RDF molecule defi- nition has inherent limitations that will adversely affect performance. In this paper, we propose a number of extensions to RDF molecules (hierarchy and or- dering) to overcome these limitations. We then present implementation details for our MapReduce-based RDF molecule store describing: (a) graph decompo- sition into molecules; (b) SPARQL querying across molecules; and (c) mole- cule merging to retrieve the search results. Finally we evaluate the benefits of our approach in the context of the BioMANTA project – an application that re- quires integration and querying across large-scale protein-protein interaction datasets. The results of performance evaluations based on this case study are presented and discussed.
Users
Please
log in to take part in the discussion (add own reviews or comments).