Data management is growing in complexity as large-scale applications
take advantage of the loosely coupled resources brought together
by grid middleware and by abundant storage capacity. Metadata describing
the data products used in and generated by these applications is
essential to disambiguate the data and enable reuse. Data provenance,
one kind of metadata, pertains to the derivation history of a data
product starting from its original sources. The provenance of data
products generated by complex transformations such as workflows is
of considerable value to scientists. From it, one can ascertain the
quality of the data based on its ancestral data and derivations,
track back sources of errors, allow automated re-enactment of derivations
to update a data, and provide attribution of data sources. Provenance
is also essential to the business domain where it can be used to
drill down to the source of data in a data warehouse, track the creation
of intellectual property, and provide an audit trail for regulatory
purposes. In this paper we create a taxonomy of data provenance techniques,
and apply the classification to current research efforts in the field.
The main aspect of our taxonomy categorizes provenance systems based
on why they record provenance, what they describe, how they represent
and store provenance, and ways to disseminate it. Our synthesis can
help those building scientific and business metadata-management systems
to understand existing provenance system designs. The survey culminates
with an identification of open research problems in the field.
%0 Report
%1 Simmhan:iucstr:2005
%A Simmhan, Yogesh L.
%A Plale, Beth
%A Gannon, Dennis
%B Technical Report TR-612, Computer Science Department, Indiana University
%D 2005
%K evidence history kilt-fusion provenance
%N 612
%T A Survey of Data Provenance Techniques
%U http://www.cs.indiana.edu/pub/techreports/TR618.pdf
%X Data management is growing in complexity as large-scale applications
take advantage of the loosely coupled resources brought together
by grid middleware and by abundant storage capacity. Metadata describing
the data products used in and generated by these applications is
essential to disambiguate the data and enable reuse. Data provenance,
one kind of metadata, pertains to the derivation history of a data
product starting from its original sources. The provenance of data
products generated by complex transformations such as workflows is
of considerable value to scientists. From it, one can ascertain the
quality of the data based on its ancestral data and derivations,
track back sources of errors, allow automated re-enactment of derivations
to update a data, and provide attribution of data sources. Provenance
is also essential to the business domain where it can be used to
drill down to the source of data in a data warehouse, track the creation
of intellectual property, and provide an audit trail for regulatory
purposes. In this paper we create a taxonomy of data provenance techniques,
and apply the classification to current research efforts in the field.
The main aspect of our taxonomy categorizes provenance systems based
on why they record provenance, what they describe, how they represent
and store provenance, and ways to disseminate it. Our synthesis can
help those building scientific and business metadata-management systems
to understand existing provenance system designs. The survey culminates
with an identification of open research problems in the field.
@techreport{Simmhan:iucstr:2005,
abstract = {Data management is growing in complexity as large-scale applications
take advantage of the loosely coupled resources brought together
by grid middleware and by abundant storage capacity. Metadata describing
the data products used in and generated by these applications is
essential to disambiguate the data and enable reuse. Data provenance,
one kind of metadata, pertains to the derivation history of a data
product starting from its original sources. The provenance of data
products generated by complex transformations such as workflows is
of considerable value to scientists. From it, one can ascertain the
quality of the data based on its ancestral data and derivations,
track back sources of errors, allow automated re-enactment of derivations
to update a data, and provide attribution of data sources. Provenance
is also essential to the business domain where it can be used to
drill down to the source of data in a data warehouse, track the creation
of intellectual property, and provide an audit trail for regulatory
purposes. In this paper we create a taxonomy of data provenance techniques,
and apply the classification to current research efforts in the field.
The main aspect of our taxonomy categorizes provenance systems based
on why they record provenance, what they describe, how they represent
and store provenance, and ways to disseminate it. Our synthesis can
help those building scientific and business metadata-management systems
to understand existing provenance system designs. The survey culminates
with an identification of open research problems in the field.},
added-at = {2016-09-28T14:21:37.000+0200},
author = {Simmhan, Yogesh L. and Plale, Beth and Gannon, Dennis},
biburl = {https://www.bibsonomy.org/bibtex/2989b6f3881ea23e3632a9da205b779df/kaymueller},
booktitle = {Technical Report TR-612, Computer Science Department, Indiana University},
institution = {Computer Science Department, Indiana University},
interhash = {0af15701a90ee77bba78b880b1af8b52},
intrahash = {989b6f3881ea23e3632a9da205b779df},
keywords = {evidence history kilt-fusion provenance},
note = {Extended version of SIGMOD Record 2005},
number = 612,
owner = {Simmhan},
timestamp = {2016-09-28T14:21:37.000+0200},
title = {A Survey of Data Provenance Techniques},
url = {http://www.cs.indiana.edu/pub/techreports/TR618.pdf},
year = 2005
}