Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion.
Apache's Hadoop project aims to solve these problems by providing a framework for running large data processing applications on clusters of commodity hardware. Combined with Amazon EC2 for running the application, and Amazon S3 for storing the data, we can run large jobs very economically. This paper describes how to use Amazon Web Services and Hadoop to run an ad hoc analysis on a large collection of web access logs that otherwise would have cost a prohibitive amount in either time or money.
In late 2004, Google surprised the world of computing with the release of the paper MapReduce: Simplified Data Processing on Large Clusters. That paper ushered in a new model for data processing across clusters of machines that had the benefit of being simple to understand and incredibly flexible. Once you adopt a MapReduce way of thinking, dozens of previously difficult or long-running tasks suddenly start to seem approachable–if you have sufficient hardware.
J. Dean, and S. Ghemawat. Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6, page 137--149. Berkeley, CA, USA, USENIX Association, (2004)
J. Dean, and S. Ghemawat. OSDI'04: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, page 10--10. Berkeley, CA, USA, USENIX Association, (2004)
J. Dean, and S. Ghemawat. In OSDI’04: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, USENIX Association, (2004)
C. Bellettini, M. Camilli, L. Capra, and M. Monga. Reachability Problems, volume 8169 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, (2013)
F. Chierichetti, R. Kumar, and A. Tomkins. WWW '10: Proceedings of the 19th international conference on World wide web, page 231--240. New York, NY, USA, ACM, (2010)
F. Chierichetti, R. Kumar, and A. Tomkins. WWW '10: Proceedings of the 19th international conference on World wide web, page 231--240. New York, NY, USA, ACM, (2010)
A. Ghoting, P. Kambadur, E. Pednault, and R. Kannan. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21-24, 2011, page 334-342. (2011)