Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion.
Apache's Hadoop project aims to solve these problems by providing a framework for running large data processing applications on clusters of commodity hardware. Combined with Amazon EC2 for running the application, and Amazon S3 for storing the data, we can run large jobs very economically. This paper describes how to use Amazon Web Services and Hadoop to run an ad hoc analysis on a large collection of web access logs that otherwise would have cost a prohibitive amount in either time or money.
In late 2004, Google surprised the world of computing with the release of the paper MapReduce: Simplified Data Processing on Large Clusters. That paper ushered in a new model for data processing across clusters of machines that had the benefit of being simple to understand and incredibly flexible. Once you adopt a MapReduce way of thinking, dozens of previously difficult or long-running tasks suddenly start to seem approachable–if you have sufficient hardware.
G. Sadasivam, and G. Baktavatchalam. MDAC '10: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud, page 1--7. New York, NY, USA, ACM, (2010)
J. Lin. SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, page 155--162. New York, NY, USA, ACM, (2009)
G. Limaye, J. Chaudhary, and P. Punjabi. International Journal on Recent and Innovation Trends in Computing and Communication, 3 (3):
1699--1703(March 2015)
K. Rohloff, and R. Schantz. Proceedings of the fourth international workshop on Data-intensive distributed computing, page 35--44. New York, NY, USA, ACM, (2011)
R. Cordeiro, C. Jr., A. Traina, J. López, U. Kang, and C. Faloutsos. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21-24, 2011, page 690-698. ACM, (2011)
Q. Chen, A. Therber, M. Hsu, H. Zeller, B. Zhang, and R. Wu. Proceedings of the 2009 International Database Engineering & Applications Symposium, page 43--53. New York, NY, USA, ACM, (2009)