copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

A Survey of Data Provenance Techniques

Y. Simmhan, B. Plale, and D. Gannon. 612. Computer Science Department, Indiana University, (2005)

Abstract

Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources. The provenance of data products generated by complex transformations such as workflows is of considerable value to scientists. From it, one can ascertain the quality of the data based on its ancestral data and derivations, track back sources of errors, allow automated re-enactment of derivations to update a data, and provide attribution of data sources. Provenance is also essential to the business domain where it can be used to drill down to the source of data in a data warehouse, track the creation of intellectual property, and provide an audit trail for regulatory purposes. In this paper we create a taxonomy of data provenance techniques, and apply the classification to current research efforts in the field. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. Our synthesis can help those building scientific and business metadata-management systems to understand existing provenance system designs. The survey culminates with an identification of open research problems in the field.

Links and resources

BibTeX key: Simmhan:iucstr:2005
entry type: techreport
booktitle: Technical Report TR-612, Computer Science Department, Indiana University
year: 2005
institution: Computer Science Department, Indiana University
number: 612
owner: Simmhan
Document: http://www.cs.indiana.edu/pub/techreports/TR618.pdf
note: Extended version of SIGMOD Record 2005

@kaymueller's tags highlighted

Cite this publication

@techreport{Simmhan:iucstr:2005, abstract = {Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources. The provenance of data products generated by complex transformations such as workflows is of considerable value to scientists. From it, one can ascertain the quality of the data based on its ancestral data and derivations, track back sources of errors, allow automated re-enactment of derivations to update a data, and provide attribution of data sources. Provenance is also essential to the business domain where it can be used to drill down to the source of data in a data warehouse, track the creation of intellectual property, and provide an audit trail for regulatory purposes. In this paper we create a taxonomy of data provenance techniques, and apply the classification to current research efforts in the field. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. Our synthesis can help those building scientific and business metadata-management systems to understand existing provenance system designs. The survey culminates with an identification of open research problems in the field.}, added-at = {2016-09-28T14:21:37.000+0200}, author = {Simmhan, Yogesh L. and Plale, Beth and Gannon, Dennis}, biburl = {https://www.bibsonomy.org/bibtex/2989b6f3881ea23e3632a9da205b779df/kaymueller}, booktitle = {Technical Report TR-612, Computer Science Department, Indiana University}, institution = {Computer Science Department, Indiana University}, interhash = {0af15701a90ee77bba78b880b1af8b52}, intrahash = {989b6f3881ea23e3632a9da205b779df}, keywords = {evidence history kilt-fusion provenance}, note = {Extended version of SIGMOD Record 2005}, number = 612, owner = {Simmhan}, timestamp = {2016-09-28T14:21:37.000+0200}, title = {A Survey of Data Provenance Techniques}, url = {http://www.cs.indiana.edu/pub/techreports/TR618.pdf}, year = 2005 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

A Survey of Data Provenance Techniques

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML A Survey of Data Provenance Techniques

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

A Survey of Data Provenance Techniques

Comments and Reviews
(0)