Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers.
Sqoop is a tool designed to import data from relational databases into Hadoop. Sqoop uses JDBC to connect to a database. It examines each table’s schema and automatically generates the necessary classes to import data into the Hadoop Distributed File System (HDFS). Sqoop then creates and launches a MapReduce job to read tables from the database via DBInputFormat, the JDBC-based InputFormat. Tables are read into a set of files in HDFS. Sqoop supports both SequenceFile and text-based target and includes performance enhancements for loading data from MySQL.
K. Rohloff, and R. Schantz. Proceedings of the fourth international workshop on Data-intensive distributed computing, page 35--44. New York, NY, USA, ACM, (2011)
A. Ghoting, P. Kambadur, E. Pednault, and R. Kannan. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21-24, 2011, page 334-342. (2011)
R. Cordeiro, C. Jr., A. Traina, J. López, U. Kang, and C. Faloutsos. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21-24, 2011, page 690-698. ACM, (2011)
F. Chierichetti, R. Kumar, and A. Tomkins. WWW '10: Proceedings of the 19th international conference on World wide web, page 231--240. New York, NY, USA, ACM, (2010)
F. Chierichetti, R. Kumar, and A. Tomkins. WWW '10: Proceedings of the 19th international conference on World wide web, page 231--240. New York, NY, USA, ACM, (2010)
P. Ravindra, V. Deshpande, and K. Anyanwu. MDAC '10: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud, page 1--6. New York, NY, USA, ACM, (2010)
G. Sadasivam, and G. Baktavatchalam. MDAC '10: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud, page 1--7. New York, NY, USA, ACM, (2010)
D. Hiemstra, and C. Hauff. Multilingual and Multimodal Information Access Evaluation, volume 6360 of Lecture Notes in Computer Science, page 64--69. Berlin, Springer Verlag, (2010)
P. Pantel, E. Crestan, A. Borkovsky, A. Popescu, and V. Vyas. EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, page 938--947. Morristown, NJ, USA, Association for Computational Linguistics, (2009)
J. Urbani, S. Kotoulas, E. Oren, and F. van Harmelen. International Semantic Web Conference, volume 5823 of Lecture Notes in Computer Science, page 634-649. Springer, (2009)
T. Sandholm, and K. Lai. SIGMETRICS '09: Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems, page 299--310. New York, NY, USA, ACM, (2009)