Schnell, robust, einfach zu nutzen, skalierbar, weit einsetzbar und inklusive Monitoring: Das verspricht MapReduce, ein Framework von Google zur nebenläufigen Berechnung sehr großer Datenmengen auf Rechnerclustern. Ein mutiges Versprechen. Dieser Artikel wird zeigen, ob MapReduce es einlöst.
Data analytics is becoming increasingly prominent in a variety
of application areas ranging from extracting business intelligence
to processing data from scientific studies. MapReduce
programming paradigm lends itself well to these data-intensive
analytics jobs, given its ability to scale-out and leverage several
machines to parallely process data. In this work we argue
that such MapReduce-based analytics are particularly synergistic
with the pay-as-you-go model of a cloud platform. However,
a key challenge facing end-users in this environment is
the ability to provision MapReduce applications to minimize
the incurred cost, while obtaining the best performance. This
paper firstmotivates the importance of optimally provisioning a
MapReduce job, and demonstrates that existing approaches can
result in far from optimal provisioning. We then present a preliminary
approach that improves MapReduce provisioning by
analyzing and comparing resource consumption of the application
at hand with a database of similar resource consumption
signatures of other applications.
J. Illig, A. Hotho, R. Jäschke, and G. Stumme. Knowledge Processing and Data Analysis, volume 6581 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 10.1007/978-3-642-22140-8_9.(2011)
J. Illig, A. Hotho, R. Jäschke, and G. Stumme. Knowledge Processing and Data Analysis, volume 6581 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 10.1007/978-3-642-22140-8_9.(2011)