copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Inferring XML Schema Definitions from XML Data

G. Bex, F. Neven, and S. Vansummeren. Proceedings of the 33rd International Conference on Very Large Data Bases, page 998-1009. Vienna, Austria, ACM Press, (September 2007)

Abstract

Although the presence of a schema enables many optimizations for operations on XML documents, recent studies have shown that many XML documents in practice either do not refer to a schema, or refer to a syntactically incorrect one. It is therefore of utmost importance to provide tools and techniques that can automatically generate schemas from sets of sample documents. While previous work in this area has mostly focused on the inference of Document Type Definitions (DTDs for short), we will consider the inference of XML Schema Definitions (XSDs for short) �-- the increasingly popular schema formalism that is turning DTDs obsolete. In contrast to DTDs where the content model of an element depends only on the element's name, the content model in an XSD can also depend on the context in which the element is used. Hence, while the inference of DTDs basically reduces to the inference of regular expressions from sets of sample strings, the inference of XSDs also entails identifying from a corpus of sample documents the contexts in which elements bear different content models. Since a seminal result by Gold implies that no inference algorithm can learn the complete class of XSDs from positive examples only, we focus on a class of XSDs that captures most XSDs occurring in practice. For this class, we provide a theoretically complete algorithm that always infers the correct XSD when a sufficiently large corpus of XML documents is available. In addition, we present a variant of this algorithm that works well on real-world (and therefore incomplete) data sets.

Description

dret'd bibliography

Links and resources

BibTeX key: bex07
entry type: inproceedings
address: Vienna, Austria
booktitle: Proceedings of the 33rd International Conference on Very Large Data Bases
year: 2007
month: September
pages: 998-1009
publisher: ACM Press
crossref: vldb07
topic: xsd0.9
index: VLDB 2007
uri: http://www.vldb.org/conf/2007/papers/research/p998-bex.pdf
isbn: 978-1-59593-649-3

@maxirichter's tags highlighted

Cite this publication

%0 Conference Paper %1 bex07 %A Bex, Geert Jan %A Neven, Frank %A Vansummeren, Stijn %B Proceedings of the 33rd International Conference on Very Large Data Bases %C Vienna, Austria %D 2007 %E Koch, Christoph %E Gehrke, Johannes %E Garofalakis, Minos N. %E Srivastava, Divesh %E Aberer, Karl %E Deshpande, Anand %E Florescu, Daniela %E Chan, Chee Yong %E Ganti, Venkatesh %E Kanne, Carl-Christian %E Klas, Wolfgang %E Neuhold, Erich J. %I ACM Press %K inference schema xml xsd %P 998-1009 %T Inferring XML Schema Definitions from XML Data %X Although the presence of a schema enables many optimizations for operations on XML documents, recent studies have shown that many XML documents in practice either do not refer to a schema, or refer to a syntactically incorrect one. It is therefore of utmost importance to provide tools and techniques that can automatically generate schemas from sets of sample documents. While previous work in this area has mostly focused on the inference of Document Type Definitions (DTDs for short), we will consider the inference of XML Schema Definitions (XSDs for short) �-- the increasingly popular schema formalism that is turning DTDs obsolete. In contrast to DTDs where the content model of an element depends only on the element's name, the content model in an XSD can also depend on the context in which the element is used. Hence, while the inference of DTDs basically reduces to the inference of regular expressions from sets of sample strings, the inference of XSDs also entails identifying from a corpus of sample documents the contexts in which elements bear different content models. Since a seminal result by Gold implies that no inference algorithm can learn the complete class of XSDs from positive examples only, we focus on a class of XSDs that captures most XSDs occurring in practice. For this class, we provide a theoretically complete algorithm that always infers the correct XSD when a sufficiently large corpus of XML documents is available. In addition, we present a variant of this algorithm that works well on real-world (and therefore incomplete) data sets. %@ 978-1-59593-649-3

@inproceedings{bex07, abstract = {Although the presence of a schema enables many optimizations for operations on XML documents, recent studies have shown that many XML documents in practice either do not refer to a schema, or refer to a syntactically incorrect one. It is therefore of utmost importance to provide tools and techniques that can automatically generate schemas from sets of sample documents. While previous work in this area has mostly focused on the inference of Document Type Definitions (DTDs for short), we will consider the inference of XML Schema Definitions (XSDs for short) �-- the increasingly popular schema formalism that is turning DTDs obsolete. In contrast to DTDs where the content model of an element depends only on the element's name, the content model in an XSD can also depend on the context in which the element is used. Hence, while the inference of DTDs basically reduces to the inference of regular expressions from sets of sample strings, the inference of XSDs also entails identifying from a corpus of sample documents the contexts in which elements bear different content models. Since a seminal result by Gold implies that no inference algorithm can learn the complete class of XSDs from positive examples only, we focus on a class of XSDs that captures most XSDs occurring in practice. For this class, we provide a theoretically complete algorithm that always infers the correct XSD when a sufficiently large corpus of XML documents is available. In addition, we present a variant of this algorithm that works well on real-world (and therefore incomplete) data sets.}, added-at = {2010-02-26T10:42:20.000+0100}, address = {Vienna, Austria}, author = {Bex, Geert Jan and Neven, Frank and Vansummeren, Stijn}, biburl = {https://www.bibsonomy.org/bibtex/215b2c8b8fa669edd32c45495181f19d7/maxirichter}, booktitle = {Proceedings of the 33rd International Conference on Very Large Data Bases}, crossref = {vldb07}, description = {dret'd bibliography}, editor = {Koch, Christoph and Gehrke, Johannes and Garofalakis, Minos N. and Srivastava, Divesh and Aberer, Karl and Deshpande, Anand and Florescu, Daniela and Chan, Chee Yong and Ganti, Venkatesh and Kanne, Carl-Christian and Klas, Wolfgang and Neuhold, Erich J.}, index = {VLDB 2007}, interhash = {a7c7effa046c45007b95ae1c17300007}, intrahash = {15b2c8b8fa669edd32c45495181f19d7}, isbn = {978-1-59593-649-3}, keywords = {inference schema xml xsd}, month = {September}, pages = {998-1009}, publisher = {ACM Press}, timestamp = {2010-02-26T10:42:20.000+0100}, title = {Inferring XML Schema Definitions from XML Data}, topic = {xsd[0.9]}, uri = {http://www.vldb.org/conf/2007/papers/research/p998-bex.pdf}, year = 2007 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Inferring XML Schema Definitions from XML Data

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Inferring XML Schema Definitions from XML Data

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Inferring XML Schema Definitions from XML Data

Comments and Reviews
(0)