Abstract
Many scientific disciplines are now data and information driven, and
new scientific knowledge is often gained by scientists puttingtogether
data analysis and knowledge discovery ``pipelines''. A related trend
is that more and more scientific communities realize thebenefits
of sharing their data and computational services, and are thus contributing
to a distributed data and computational communityinfrastructure (a.k.a.
``the Grid''). However, this infrastructure is only a means to an
end and ideally scientists should not be tooconcerned with its existence.
The goal is for scientists to focus on development and use of what
we call scientific workflows. These arenetworks of analytical steps
that may involve, e.g., database access and querying steps, data
analysis and mining steps, and many othersteps including computationally
intensive jobs on high-performance cluster computers. In this paper
we describe characteristics of andrequirements for scientific workflows
as identified in a number of our application projects. We then elaborate
on Kepler, a particularscientific workflow system, currently under
development across a number of scientific data management projects.
We describe some keyfeatures of Kepler and its underlying Ptolemy
II system, planned extensions, and areas of future research. Kepler
is a community-driven,open source project, and we always welcome
related projects and new contributors to join.
Description
Bioinformatics Workflow Systems
Links and resources
Tags
community