Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers.
Sqoop is a tool designed to import data from relational databases into Hadoop. Sqoop uses JDBC to connect to a database. It examines each table’s schema and automatically generates the necessary classes to import data into the Hadoop Distributed File System (HDFS). Sqoop then creates and launches a MapReduce job to read tables from the database via DBInputFormat, the JDBC-based InputFormat. Tables are read into a set of files in HDFS. Sqoop supports both SequenceFile and text-based target and includes performance enhancements for loading data from MySQL.
P. Pantel, E. Crestan, A. Borkovsky, A. Popescu, und V. Vyas. EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Seite 938--947. Morristown, NJ, USA, Association for Computational Linguistics, (2009)
P. Ravindra, V. Deshpande, und K. Anyanwu. MDAC '10: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud, Seite 1--6. New York, NY, USA, ACM, (2010)
K. Rohloff, und R. Schantz. Proceedings of the fourth international workshop on Data-intensive distributed computing, Seite 35--44. New York, NY, USA, ACM, (2011)
G. Sadasivam, und G. Baktavatchalam. MDAC '10: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud, Seite 1--7. New York, NY, USA, ACM, (2010)
T. Sandholm, und K. Lai. SIGMETRICS '09: Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems, Seite 299--310. New York, NY, USA, ACM, (2009)
J. Urbani, S. Kotoulas, E. Oren, und F. van Harmelen. International Semantic Web Conference, Volume 5823 von Lecture Notes in Computer Science, Seite 634-649. Springer, (2009)