A Scale-Out RDF Molecule Store for Improved Co-Identification, Querying and Inferencing
A. Newman, Y. Li, and J. Hunter. Proceedings of the 4th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS), page 1--16. (October 2008)
Abstract
Semantic inferencing and querying across large scale RDF triple stores is notoriously slow. Our objective is to expedite this process by employ- ing Google’s MapReduce framework to implement scale-out distributed query- ing and reasoning. This approach requires RDF graphs to be decomposed into smaller units that are distributed across computational nodes. RDF Molecules appear to offer an ideal approach – providing an intermediate level of granulari- ty between RDF graphs and triples. However, the original RDF molecule defi- nition has inherent limitations that will adversely affect performance. In this paper, we propose a number of extensions to RDF molecules (hierarchy and or- dering) to overcome these limitations. We then present implementation details for our MapReduce-based RDF molecule store describing: (a) graph decompo- sition into molecules; (b) SPARQL querying across molecules; and (c) mole- cule merging to retrieve the search results. Finally we evaluate the benefits of our approach in the context of the BioMANTA project – an application that re- quires integration and querying across large-scale protein-protein interaction datasets. The results of performance evaluations based on this case study are presented and discussed.
%0 Conference Paper
%1 Newman2008RDFMoleculeStore
%A Newman, Andrew
%A Li, Yuan-Fang
%A Hunter, Jane
%B Proceedings of the 4th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS)
%D 2008
%K
%P 1--16
%T A Scale-Out RDF Molecule Store for Improved Co-Identification, Querying and Inferencing
%X Semantic inferencing and querying across large scale RDF triple stores is notoriously slow. Our objective is to expedite this process by employ- ing Google’s MapReduce framework to implement scale-out distributed query- ing and reasoning. This approach requires RDF graphs to be decomposed into smaller units that are distributed across computational nodes. RDF Molecules appear to offer an ideal approach – providing an intermediate level of granulari- ty between RDF graphs and triples. However, the original RDF molecule defi- nition has inherent limitations that will adversely affect performance. In this paper, we propose a number of extensions to RDF molecules (hierarchy and or- dering) to overcome these limitations. We then present implementation details for our MapReduce-based RDF molecule store describing: (a) graph decompo- sition into molecules; (b) SPARQL querying across molecules; and (c) mole- cule merging to retrieve the search results. Finally we evaluate the benefits of our approach in the context of the BioMANTA project – an application that re- quires integration and querying across large-scale protein-protein interaction datasets. The results of performance evaluations based on this case study are presented and discussed.
@inproceedings{Newman2008RDFMoleculeStore,
abstract = {Semantic inferencing and querying across large scale RDF triple stores is notoriously slow. Our objective is to expedite this process by employ- ing Google’s MapReduce framework to implement scale-out distributed query- ing and reasoning. This approach requires RDF graphs to be decomposed into smaller units that are distributed across computational nodes. RDF Molecules appear to offer an ideal approach – providing an intermediate level of granulari- ty between RDF graphs and triples. However, the original RDF molecule defi- nition has inherent limitations that will adversely affect performance. In this paper, we propose a number of extensions to RDF molecules (hierarchy and or- dering) to overcome these limitations. We then present implementation details for our MapReduce-based RDF molecule store describing: (a) graph decompo- sition into molecules; (b) SPARQL querying across molecules; and (c) mole- cule merging to retrieve the search results. Finally we evaluate the benefits of our approach in the context of the BioMANTA project – an application that re- quires integration and querying across large-scale protein-protein interaction datasets. The results of performance evaluations based on this case study are presented and discussed.},
added-at = {2011-12-12T19:01:04.000+0100},
author = {Newman, Andrew and Li, Yuan-Fang and Hunter, Jane},
biburl = {https://www.bibsonomy.org/bibtex/29ab15479d363e5b7052320c290c08ae4/gergie},
booktitle = {Proceedings of the 4th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS)},
file = {:Newman2008RDFMoleculeStore.pdf:PDF},
groups = {public},
interhash = {ad5a214791408a55ef52652551e1a3e5},
intrahash = {9ab15479d363e5b7052320c290c08ae4},
keywords = {},
month = {October},
pages = {1--16},
timestamp = {2011-12-12T19:01:04.000+0100},
title = {{A Scale-Out RDF Molecule Store for Improved Co-Identification, Querying and Inferencing}},
username = {gergie},
year = 2008
}