copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Large-scale RDF Dataset Slicing

E. Marx, S. Shekarpour, S. Auer, and A. Ngomo. 7th IEEE International Conference on Semantic Computing, September 16-18, 2013, Irvine, California, USA, (2013)

Abstract

In the last years an increasing number of structured data was published on the Web as Linked Open Data (LOD). Despite recent advances, consuming and using Linked Open Data within an organization is still a substantial challenge. Many of the LOD datasets are quite large and despite progress in RDF data management their loading and querying within a triple store is extremely time-consuming and resource-demanding. To overcome this consumption obstacle, we propose a process inspired by the classical Extract-Transform-Load (ETL) paradigm. In this article, we focus particularly on the selection and extraction steps of this process. We devise a fragment of SPARQL dubbed SliceSPARQL, which enables the selection of well-defined slices of datasets fulfilling typical information needs. SliceSPARQL supports graph patterns for which each connected subgraph pattern involves a maximum of one variable or IRI in its join conditions. This restriction guarantees the efficient processing of the query against a sequential dataset dump stream. As a result our evaluation shows that dataset slices can be generated an order of magnitude faster than by using the conventional approach of loading the whole dataset into a triple store and retrieving the slice by executing the query against the triple store's SPARQL endpoint.

Links and resources

BibTeX key: Marx2013
entry type: inproceedings
booktitle: 7th IEEE International Conference on Semantic Computing, September 16-18, 2013, Irvine, California, USA
year: 2013
owner: soeren
bdsk-url-1: http://svn.aksw.org/papers/2013/ICSC_SLICE/public.pdf
Document: http://svn.aksw.org/papers/2013/ICSC_SLICE/public.pdf

@soeren's tags highlighted

Cite this publication

@inproceedings{Marx2013, abstract = {In the last years an increasing number of structured data was published on the Web as Linked Open Data (LOD). Despite recent advances, consuming and using Linked Open Data within an organization is still a substantial challenge. Many of the LOD datasets are quite large and despite progress in RDF data management their loading and querying within a triple store is extremely time-consuming and resource-demanding. To overcome this consumption obstacle, we propose a process inspired by the classical Extract-Transform-Load (ETL) paradigm. In this article, we focus particularly on the selection and extraction steps of this process. We devise a fragment of SPARQL dubbed SliceSPARQL, which enables the selection of well-defined slices of datasets fulfilling typical information needs. SliceSPARQL supports graph patterns for which each connected subgraph pattern involves a maximum of one variable or IRI in its join conditions. This restriction guarantees the efficient processing of the query against a sequential dataset dump stream. As a result our evaluation shows that dataset slices can be generated an order of magnitude faster than by using the conventional approach of loading the whole dataset into a triple store and retrieving the slice by executing the query against the triple store's SPARQL endpoint.}, added-at = {2017-01-27T23:28:47.000+0100}, author = {Marx, Edgard and Shekarpour, Saeedeh and Auer, S\"oren and Ngomo, Axel-Cyrille Ngonga}, bdsk-url-1 = {http://svn.aksw.org/papers/2013/ICSC_SLICE/public.pdf}, biburl = {https://www.bibsonomy.org/bibtex/2ccbd5123f1a902c683038bd751e89505/soeren}, booktitle = {7th IEEE International Conference on Semantic Computing, September 16-18, 2013, Irvine, California, USA}, interhash = {5f2eaf20305e583d74d25d56ecfa07b1}, intrahash = {ccbd5123f1a902c683038bd751e89505}, keywords = {2013 auer event_ICSC group_aksw lod2page marx ngonga shekarpour}, owner = {soeren}, timestamp = {2017-01-27T23:30:12.000+0100}, title = {Large-scale RDF Dataset Slicing}, url = {http://svn.aksw.org/papers/2013/ICSC_SLICE/public.pdf}, year = 2013 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Large-scale RDF Dataset Slicing

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Large-scale RDF Dataset Slicing

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Large-scale RDF Dataset Slicing

Comments and Reviews
(0)