copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Test-driven Evaluation of Linked Data Quality

D. Kontokostas, P. Westphal, S. Auer, S. Hellmann, J. Lehmann, R. Cornelissen, and A. Zaveri. Proceedings of the 23rd International Conference on World Wide Web, page 747--758. International World Wide Web Conferences Steering Committee, (2014)
DOI: 10.1145/2566486.2568002

Abstract

Linked Open Data (LOD) comprises of an unprecedented volume of structured data on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowd-sourced or extracted data of often relatively low quality. We present a methodology for test-driven quality assessment of Linked Data, which is inspired by test-driven software development. We argue, that vocabularies, ontologies and knowledge bases should be accompanied by a number of test cases, which help to ensure a basic level of quality. We present a methodology for assessing the quality of linked data resources, based on a formalization of bad smells and data quality problems. Our formalization employs SPARQL query templates, which are instantiated into concrete quality test case queries. Based on an extensive survey, we compile a comprehensive library of data quality test case patterns. We perform automatic test case instantiation based on schema constraints or semi-automatically enriched schemata and allow the user to generate specific test case instantiations that are applicable to a schema or dataset. We provide an extensive evaluation of five LOD datasets, manual test case instantiation for five schemas and automatic test case instantiations for all available schemata registered with LOV. One of the main advantages of our approach is that domain specific semantics can be encoded in the data quality test cases, thus being able to discover data quality problems beyond conventional quality heuristics.

Links and resources

BibTeX key: kontokostasDatabugger
entry type: inproceedings
booktitle: Proceedings of the 23rd International Conference on World Wide Web
year: 2014
pages: 747--758
publisher: International World Wide Web Conferences Steering Committee
series: WWW '14
acmid: 2568002
isbn: 978-1-4503-2744-2
bdsk-url-1: http://svn.aksw.org/papers/2014/WWW_Databugger/public.pdf
bdsk-url-2: http://dx.doi.org/10.1145/2566486.2568002
numpages: 12
date-modified: 2015-02-06 06:56:57 +0000
location: Seoul, Korea
DOI: 10.1145/2566486.2568002
Document: http://svn.aksw.org/papers/2014/WWW_Databugger/public.pdf

@aksw's tags highlighted

Cite this publication

%0 Conference Paper %1 kontokostasDatabugger %A Kontokostas, Dimitris %A Westphal, Patrick %A Auer, Sören %A Hellmann, Sebastian %A Lehmann, Jens %A Cornelissen, Roland %A Zaveri, Amrapali %B Proceedings of the 23rd International Conference on World Wide Web %D 2014 %I International World Wide Web Conferences Steering Committee %K 2014 MOLE dataquality dllearner group_aksw kontokostas lehmann lod2page rdfunit topic_QualityAnalysis westphal %P 747--758 %R 10.1145/2566486.2568002 %T Test-driven Evaluation of Linked Data Quality %U http://svn.aksw.org/papers/2014/WWW_Databugger/public.pdf %X Linked Open Data (LOD) comprises of an unprecedented volume of structured data on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowd-sourced or extracted data of often relatively low quality. We present a methodology for test-driven quality assessment of Linked Data, which is inspired by test-driven software development. We argue, that vocabularies, ontologies and knowledge bases should be accompanied by a number of test cases, which help to ensure a basic level of quality. We present a methodology for assessing the quality of linked data resources, based on a formalization of bad smells and data quality problems. Our formalization employs SPARQL query templates, which are instantiated into concrete quality test case queries. Based on an extensive survey, we compile a comprehensive library of data quality test case patterns. We perform automatic test case instantiation based on schema constraints or semi-automatically enriched schemata and allow the user to generate specific test case instantiations that are applicable to a schema or dataset. We provide an extensive evaluation of five LOD datasets, manual test case instantiation for five schemas and automatic test case instantiations for all available schemata registered with LOV. One of the main advantages of our approach is that domain specific semantics can be encoded in the data quality test cases, thus being able to discover data quality problems beyond conventional quality heuristics. %@ 978-1-4503-2744-2

@inproceedings{kontokostasDatabugger, abstract = {Linked Open Data (LOD) comprises of an unprecedented volume of structured data on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowd-sourced or extracted data of often relatively low quality. We present a methodology for test-driven quality assessment of Linked Data, which is inspired by test-driven software development. We argue, that vocabularies, ontologies and knowledge bases should be accompanied by a number of test cases, which help to ensure a basic level of quality. We present a methodology for assessing the quality of linked data resources, based on a formalization of bad smells and data quality problems. Our formalization employs SPARQL query templates, which are instantiated into concrete quality test case queries. Based on an extensive survey, we compile a comprehensive library of data quality test case patterns. We perform automatic test case instantiation based on schema constraints or semi-automatically enriched schemata and allow the user to generate specific test case instantiations that are applicable to a schema or dataset. We provide an extensive evaluation of five LOD datasets, manual test case instantiation for five schemas and automatic test case instantiations for all available schemata registered with LOV. One of the main advantages of our approach is that domain specific semantics can be encoded in the data quality test cases, thus being able to discover data quality problems beyond conventional quality heuristics.}, acmid = {2568002}, added-at = {2024-03-04T14:14:25.000+0100}, author = {Kontokostas, Dimitris and Westphal, Patrick and Auer, S\"{o}ren and Hellmann, Sebastian and Lehmann, Jens and Cornelissen, Roland and Zaveri, Amrapali}, bdsk-url-1 = {http://svn.aksw.org/papers/2014/WWW_Databugger/public.pdf}, bdsk-url-2 = {http://dx.doi.org/10.1145/2566486.2568002}, biburl = {https://www.bibsonomy.org/bibtex/2055b0803da55ef8abb539d2a622d53e8/aksw}, booktitle = {Proceedings of the 23rd International Conference on World Wide Web}, date-modified = {2015-02-06 06:56:57 +0000}, doi = {10.1145/2566486.2568002}, interhash = {66a6d782062b615d9b4fa141ceb2473a}, intrahash = {055b0803da55ef8abb539d2a622d53e8}, isbn = {978-1-4503-2744-2}, keywords = {2014 MOLE dataquality dllearner group_aksw kontokostas lehmann lod2page rdfunit topic_QualityAnalysis westphal}, location = {Seoul, Korea}, numpages = {12}, pages = {747--758}, publisher = {International World Wide Web Conferences Steering Committee}, series = {WWW '14}, timestamp = {2024-03-04T14:14:25.000+0100}, title = {Test-driven Evaluation of Linked Data Quality}, url = {http://svn.aksw.org/papers/2014/WWW_Databugger/public.pdf}, year = 2014 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Test-driven Evaluation of Linked Data Quality

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Test-driven Evaluation of Linked Data Quality

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Test-driven Evaluation of Linked Data Quality

Comments and Reviews
(0)