@gergie

A Scale-Out RDF Molecule Store for Improved Co-Identification, Querying and Inferencing

, , and . Proceedings of the 4th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS), page 1--16. (October 2008)

Abstract

Semantic inferencing and querying across large scale RDF triple stores is notoriously slow. Our objective is to expedite this process by employ- ing Google’s MapReduce framework to implement scale-out distributed query- ing and reasoning. This approach requires RDF graphs to be decomposed into smaller units that are distributed across computational nodes. RDF Molecules appear to offer an ideal approach – providing an intermediate level of granulari- ty between RDF graphs and triples. However, the original RDF molecule defi- nition has inherent limitations that will adversely affect performance. In this paper, we propose a number of extensions to RDF molecules (hierarchy and or- dering) to overcome these limitations. We then present implementation details for our MapReduce-based RDF molecule store describing: (a) graph decompo- sition into molecules; (b) SPARQL querying across molecules; and (c) mole- cule merging to retrieve the search results. Finally we evaluate the benefits of our approach in the context of the BioMANTA project – an application that re- quires integration and querying across large-scale protein-protein interaction datasets. The results of performance evaluations based on this case study are presented and discussed.

Links and resources

Tags

    community

    • @sb3000
    • @gergie
    @gergie's tags highlighted