Article,

Case Study of Scientific Data Processing on a Cloud Using Hadoop

, , , , and .
High Performance Computing Systems and Applications, (2010)

Abstract

With the increasing popularity of cloud computing, Hadoop has become a widely used open source cloud computing framework forlarge scale data processing. However, few efforts have been made to demonstrate the applicability of Hadoop to various real-worldapplication scenarios in fields other than server side computations such as web indexing, etc. In this paper, we use the Hadoopcloud computing framework to develop a user application that allows processing of scientific data on clouds. A simple extensionto Hadoop’s MapReduce is described which allows it to handle scientific data processing problems with arbitrary input formatsand explicit control over how the input is split. This approach is used to develop a Hadoop-based cloud computing applicationthat processes sequences of microscope images of live cells, and we test its performance. It is discussed how the approachcan be generalized to more complicated scientific data processing problems.

Tags

Users

  • @muehlburger

Comments and Reviews