Schnell, robust, einfach zu nutzen, skalierbar, weit einsetzbar und inklusive Monitoring: Das verspricht MapReduce, ein Framework von Google zur nebenläufigen Berechnung sehr großer Datenmengen auf Rechnerclustern. Ein mutiges Versprechen. Dieser Artikel wird zeigen, ob MapReduce es einlöst.
Data analytics is becoming increasingly prominent in a variety
of application areas ranging from extracting business intelligence
to processing data from scientific studies. MapReduce
programming paradigm lends itself well to these data-intensive
analytics jobs, given its ability to scale-out and leverage several
machines to parallely process data. In this work we argue
that such MapReduce-based analytics are particularly synergistic
with the pay-as-you-go model of a cloud platform. However,
a key challenge facing end-users in this environment is
the ability to provision MapReduce applications to minimize
the incurred cost, while obtaining the best performance. This
paper firstmotivates the importance of optimally provisioning a
MapReduce job, and demonstrates that existing approaches can
result in far from optimal provisioning. We then present a preliminary
approach that improves MapReduce provisioning by
analyzing and comparing resource consumption of the application
at hand with a database of similar resource consumption
signatures of other applications.
Hadoop On Demand (HOD) is a system for provisioning virtual Hadoop clusters over a large physical cluster. It uses the Torque resource manager to do node allocation. On the allocated nodes, it can start Hadoop Map/Reduce and HDFS daemons. It automatically generates the appropriate configuration files (hadoop-site.xml) for the Hadoop daemons and client. HOD also has the capability to distribute Hadoop to the nodes in the virtual cluster that it allocates. In short, HOD makes it easy for administrators and users to quickly setup and use Hadoop. It is also a very useful tool for Hadoop developers and testers who need to share a physical cluster for testing their own Hadoop versions.
P. Pantel, E. Crestan, A. Borkovsky, A. Popescu, и V. Vyas. EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, стр. 938--947. Morristown, NJ, USA, Association for Computational Linguistics, (2009)