Abstract
Many scientific disciplines are now data and information driven, and
new scientific knowledge is often gained by scientists putting together
data analysis and knowledge discovery pipelines. A related trend
is that more and more scientific communities realize the benefits
of sharing their data and computational services, and are thus contributing
to a distributed data and computational community infrastructure
(a.k.a. the Grid). However, this infrastructure is only a means to
an end and ideally scientists should not be too concerned with its
existence. The goal is for scientists to focus on development and
use of what we call scientific workflows. These are networks of analytical
steps that may involve, e.g., database access and querying steps,
data analysis and mining steps, and many other steps including computationally
intensive jobs on high-performance cluster computers. In this paper
we describe characteristics of and requirements for scientific workflows
as identified in a number of our application projects. We then elaborate
on Kepler, a particular scientific workflow system, currently under
development across a number of scientific data management projects.
We describe some key features of Kepler and its underlying Ptolemy
II system, planned extensions, and areas of future research. Kepler
is a community-driven, open source project, and we always welcome
related projects and new contributors to join.
Users
Please
log in to take part in the discussion (add own reviews or comments).