@p_ansell

Scientific workflow management and the Kepler system

, , , , , , , , and . Concurrency and Computation: Practice and Experience, 18 (10): 1039--1065 (2006)
DOI: 10.1002/cpe.994

Abstract

Many scientific disciplines are now data and information driven, and new scientific knowledge is often gained by scientists puttingtogether data analysis and knowledge discovery ``pipelines''. A related trend is that more and more scientific communities realize thebenefits of sharing their data and computational services, and are thus contributing to a distributed data and computational communityinfrastructure (a.k.a. ``the Grid''). However, this infrastructure is only a means to an end and ideally scientists should not be tooconcerned with its existence. The goal is for scientists to focus on development and use of what we call scientific workflows. These arenetworks of analytical steps that may involve, e.g., database access and querying steps, data analysis and mining steps, and many othersteps including computationally intensive jobs on high-performance cluster computers. In this paper we describe characteristics of andrequirements for scientific workflows as identified in a number of our application projects. We then elaborate on Kepler, a particularscientific workflow system, currently under development across a number of scientific data management projects. We describe some keyfeatures of Kepler and its underlying Ptolemy II system, planned extensions, and areas of future research. Kepler is a community-driven,open source project, and we always welcome related projects and new contributors to join.

Description

Bioinformatics Workflow Systems

Links and resources

Tags

community

  • @hidders
  • @p_ansell
  • @ludaesch
  • @thau
  • @manish
@p_ansell's tags highlighted