P. Minh Duc. Proceedings of ICDE/PHD Symposium 2013, (April 2013)
Zusammenfassung
The semantic web uses RDF as its data model, providing ultimate flexibility
for users to represent and
evolve data without need of a schema.
Yet, this flexibility poses challenges in implementing efficient RDF
stores, leading
from plans with very many self-joins to a triple table,
difficulties to optimize these, and a lack of data locality since
without
a notion of multi-attribute data structure, clustered indexing opportunities are lost.
Apart from performance
issues, users of huge RDF graphs often have problems
formulating queries as they lack any system-supported notion of the
structure in the data.
In this research, we exploit the observation that real RDF data, while not as regularly
structured
as relational data, still has the great majority of triples conforming to regular patterns.
We conjecture that a system
that would recognize this structure automatically
would both allow RDF stores to become more efficient and also easier
to use.
Concretely, we propose to derive self-organizing RDF that stores data
in PSO format in such a way that the regular
parts of the data physically
correspond to relational columnar storage; and propose RDFscan/RDFjoin algorithms
that
compute star-patterns over these without wasting effort in self-joins.
These regular parts, i.e. tables, are identified
on ingestion by a schema discovery
algorithm -- as such users will gain an SQL view of the regular part of the RDF data.
This
research aims to produce a state-of-the-art SPARQL frontend for MonetDB
as a by-product, and we already present some preliminary
results on this platform.
%0 Conference Paper
%1 20747
%A Minh Duc, P.
%B Proceedings of ICDE/PHD Symposium 2013
%D 2013
%K ldbc-related lod2page
%T Self-Organizing Structured RDF In MonetDB
%U http://oai.cwi.nl/oai/asset/20747/20747B.pdf
%X The semantic web uses RDF as its data model, providing ultimate flexibility
for users to represent and
evolve data without need of a schema.
Yet, this flexibility poses challenges in implementing efficient RDF
stores, leading
from plans with very many self-joins to a triple table,
difficulties to optimize these, and a lack of data locality since
without
a notion of multi-attribute data structure, clustered indexing opportunities are lost.
Apart from performance
issues, users of huge RDF graphs often have problems
formulating queries as they lack any system-supported notion of the
structure in the data.
In this research, we exploit the observation that real RDF data, while not as regularly
structured
as relational data, still has the great majority of triples conforming to regular patterns.
We conjecture that a system
that would recognize this structure automatically
would both allow RDF stores to become more efficient and also easier
to use.
Concretely, we propose to derive self-organizing RDF that stores data
in PSO format in such a way that the regular
parts of the data physically
correspond to relational columnar storage; and propose RDFscan/RDFjoin algorithms
that
compute star-patterns over these without wasting effort in self-joins.
These regular parts, i.e. tables, are identified
on ingestion by a schema discovery
algorithm -- as such users will gain an SQL view of the regular part of the RDF data.
This
research aims to produce a state-of-the-art SPARQL frontend for MonetDB
as a by-product, and we already present some preliminary
results on this platform.
@inproceedings{20747,
abstract = {The semantic web uses RDF as its data model, providing ultimate flexibility
for users to represent and
evolve data without need of a schema.
Yet, this flexibility poses challenges in implementing efficient RDF
stores, leading
from plans with very many self-joins to a triple table,
difficulties to optimize these, and a lack of data locality since
without
a notion of multi-attribute data structure, clustered indexing opportunities are lost.
Apart from performance
issues, users of huge RDF graphs often have problems
formulating queries as they lack any system-supported notion of the
structure in the data.
In this research, we exploit the observation that real RDF data, while not as regularly
structured
as relational data, still has the great majority of triples conforming to regular patterns.
We conjecture that a system
that would recognize this structure automatically
would both allow RDF stores to become more efficient and also easier
to use.
Concretely, we propose to derive self-organizing RDF that stores data
in PSO format in such a way that the regular
parts of the data physically
correspond to relational columnar storage; and propose RDFscan/RDFjoin algorithms
that
compute star-patterns over these without wasting effort in self-joins.
These regular parts, i.e. tables, are identified
on ingestion by a schema discovery
algorithm -- as such users will gain an SQL view of the regular part of the RDF data.
This
research aims to produce a state-of-the-art SPARQL frontend for MonetDB
as a by-product, and we already present some preliminary
results on this platform.},
added-at = {2013-09-03T17:09:05.000+0200},
author = {Minh Duc, P.},
biburl = {https://www.bibsonomy.org/bibtex/2126f9efb10f95280f9118e2c9c619f5e/peterboncz},
booktitle = {Proceedings of ICDE/PHD Symposium 2013},
conferencedate = {2013},
conferencelocation = {Australia},
conferencetitle = {ICDE/PHD Symposium},
group = {INS1},
interhash = {c7fa02bd2c2806c9fabdfdb2b90f668f},
intrahash = {126f9efb10f95280f9118e2c9c619f5e},
keywords = {ldbc-related lod2page},
language = {en},
month = {April},
refereed = {y},
timestamp = {2013-09-13T09:38:07.000+0200},
title = {Self-{Organizing} {Structured} {R{DF}} {In} {M{onetDB}}},
url = {http://oai.cwi.nl/oai/asset/20747/20747B.pdf},
year = 2013
}