Inproceedings,

Automatic Provenance Recording for Scientific Data using Trident

, , and .
American Geophysical Union (AGU) Fall Meeting, AGU, (2008)Poster.

Abstract

Provenance is increasingly recognized as being critical to the understanding and reuse of scientific datasets. Given the rapid generation of scientific data from sensors and computational model results, it is not practical to manually record provenance for data and automated techniques for provenance capture are essential. Scientific workflows provide a framework for representing computational models and complex transformations of scientific data, and present a means for tracking the operations performed to derive a dataset. The Trident Scientific Workbench is a workflow system that natively incorporates provenance capture of data derived as part of the workflow execution. The applications used as part of a Trident workflow can execute on remote computational cluster, such as a supercomputing center on in the Cloud, or on the local desktop of the researcher and provenance on data derived by the applications is seamlessly captured. Scientists also have the option to annotate the provenance metadata using domain specific tags, such as, for example, GCMD keywords. The provenance records thus captured can be exported in the Open Provenance Model XML standard that is emerging or visualized as a graph. The Trident system and provenance recorded by it has been successfully applied in the Neptune oceanography project and is presently being tested in the Pan-STARRS astronomy project.

Tags

Users

  • @simmhan

Comments and Reviews