Abstract
Executing large number of self-regulating tasks or tasks that execute minimal inter-task communication in
analogous is a common requirement in many domains. In this paper, we present our knowledge in
applying two new Microsoft technologies Dryad and Azure to three bioinformatics applications. We also
contrast with traditional MPI and Apache Hadoop MapReduce completion in one example.
The applications are an EST (Expressed Sequence Tag) series assembly program, PhyloD statistical
package to recognize HLA-associated viral evolution, and a pairwise Alu gene alignment application. We
give detailed presentation discussion on a 768 core Windows HPC Server cluster and an Azure cloud. All
the applications start with a “doubly data parallel step” connecting independent data chosen from two
parallel (EST, Alu) or two different databases (PhyloD). There are different structures for final stages in
each application.
Users
Please
log in to take part in the discussion (add own reviews or comments).