bookmark

Amazon Web Services Developer Community : Running Hadoop MapReduce on Amazon EC2 and Amazon S3


Description

Apache's Hadoop project aims to solve these problems by providing a framework for running large data processing applications on clusters of commodity hardware. Combined with Amazon EC2 for running the application, and Amazon S3 for storing the data, we can run large jobs very economically. This paper describes how to use Amazon Web Services and Hadoop to run an ad hoc analysis on a large collection of web access logs that otherwise would have cost a prohibitive amount in either time or money.

Preview

Tags

Users

  • @carlfischer

Comments and Reviews