Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion.
Apache's Hadoop project aims to solve these problems by providing a framework for running large data processing applications on clusters of commodity hardware. Combined with Amazon EC2 for running the application, and Amazon S3 for storing the data, we can run large jobs very economically. This paper describes how to use Amazon Web Services and Hadoop to run an ad hoc analysis on a large collection of web access logs that otherwise would have cost a prohibitive amount in either time or money.
In late 2004, Google surprised the world of computing with the release of the paper MapReduce: Simplified Data Processing on Large Clusters. That paper ushered in a new model for data processing across clusters of machines that had the benefit of being simple to understand and incredibly flexible. Once you adopt a MapReduce way of thinking, dozens of previously difficult or long-running tasks suddenly start to seem approachable–if you have sufficient hardware.
G. Sadasivam, und G. Baktavatchalam. MDAC '10: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud, Seite 1--7. New York, NY, USA, ACM, (2010)
J. Lin. SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, Seite 155--162. New York, NY, USA, ACM, (2009)
G. Limaye, J. Chaudhary, und P. Punjabi. International Journal on Recent and Innovation Trends in Computing and Communication, 3 (3):
1699--1703(März 2015)
K. Rohloff, und R. Schantz. Proceedings of the fourth international workshop on Data-intensive distributed computing, Seite 35--44. New York, NY, USA, ACM, (2011)
R. Cordeiro, C. Jr., A. Traina, J. López, U. Kang, und C. Faloutsos. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21-24, 2011, Seite 690-698. ACM, (2011)
Q. Chen, A. Therber, M. Hsu, H. Zeller, B. Zhang, und R. Wu. Proceedings of the 2009 International Database Engineering & Applications Symposium, Seite 43--53. New York, NY, USA, ACM, (2009)
C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, und K. Olukotun. Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems Vancouver, British Columbia, Canada, December 4-7, 2006, Seite 281-288. MIT Press, (2006)
H. chih Yang, A. Dasdan, R. Hsiao, und D. Parker. SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, Seite 1029--1040. New York, NY, USA, ACM, (2007)
H. chih Yang, A. Dasdan, R. Hsiao, und D. Parker. SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, Seite 1029--1040. New York, NY, USA, ACM, (2007)
D. Hiemstra, und C. Hauff. Multilingual and Multimodal Information Access Evaluation, Volume 6360 von Lecture Notes in Computer Science, Seite 64--69. Berlin, Springer Verlag, (2010)
T. Sandholm, und K. Lai. SIGMETRICS '09: Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems, Seite 299--310. New York, NY, USA, ACM, (2009)
J. Dean, und S. Ghemawat. Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6, Seite 137--149. Berkeley, CA, USA, USENIX Association, (2004)
J. Dean, und S. Ghemawat. OSDI'04: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, Seite 10--10. Berkeley, CA, USA, USENIX Association, (2004)
J. Dean, und S. Ghemawat. In OSDI’04: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, USENIX Association, (2004)
C. Bellettini, M. Camilli, L. Capra, und M. Monga. Reachability Problems, Volume 8169 von Lecture Notes in Computer Science, Springer Berlin Heidelberg, (2013)
F. Chierichetti, R. Kumar, und A. Tomkins. WWW '10: Proceedings of the 19th international conference on World wide web, Seite 231--240. New York, NY, USA, ACM, (2010)
F. Chierichetti, R. Kumar, und A. Tomkins. WWW '10: Proceedings of the 19th international conference on World wide web, Seite 231--240. New York, NY, USA, ACM, (2010)
A. Ghoting, P. Kambadur, E. Pednault, und R. Kannan. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21-24, 2011, Seite 334-342. (2011)
J. Urbani, S. Kotoulas, E. Oren, und F. van Harmelen. International Semantic Web Conference, Volume 5823 von Lecture Notes in Computer Science, Seite 634-649. Springer, (2009)
M. Bayir, I. Toroslu, A. Cosar, und G. Fidan. WWW '09: Proceedings of the 18th international conference on World wide web, Seite 161--170. New York, NY, USA, ACM, (2009)
M. Becker, H. Mewes, A. Hotho, D. Dimitrov, F. Lemmerich, und M. Strohmaier. International Conference Companion on World Wide Web, Seite 17--18. Republic and Canton of Geneva, Switzerland, International World Wide Web Conferences Steering Committee, (2016)
C. Bellettini, M. Camilli, L. Capra, und M. Monga. Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2012 14th International Symposium on, Seite 295-302. IEEE Computer Society, (September 2012)
P. Ravindra, V. Deshpande, und K. Anyanwu. MDAC '10: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud, Seite 1--6. New York, NY, USA, ACM, (2010)
P. Pantel, E. Crestan, A. Borkovsky, A. Popescu, und V. Vyas. EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Seite 938--947. Morristown, NJ, USA, Association for Computational Linguistics, (2009)