Abstract
The widely discussed scientific data deluge creates not only a need
to computationally scale an application from a local desktop or cluster
to a supercomputer, but also the need to cope with variable data
loads over time. Cloud computing offers a scalable, economic, on-demand
model well matched to the evolving eScience needs. Yet cloud computing
creates gaps that must be crossed to move science applications to
the cloud. In this article, we propose a Generic Worker framework
to deploy and invoke science applications in the Cloud with minimal
user effort and predictable, cost-effective performance. Our framework
is an evolution of Grid computing application factory pattern and
addresses the distinct challenges posed by the Cloud such as efficient
data transfers to and from the Cloud, and the transient nature of
its VMs. We present an implementation of the Generic Worker for the
Microsoft Azure Cloud and evaluate its use in a genome sequencing
application pipeline. Our results show that the user overhead to
port and run the application seamlessly across desktop and the Cloud
can be substantially reduced without significant performance penalties,
while providing on-demand scalability.
Users
Please
log in to take part in the discussion (add own reviews or comments).