MapReduce: Simplified Data Processing on Large Clusters
J. Dean, and S. Ghemawat. Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6, page 137--149. Berkeley, CA, USA, USENIX Association, (2004)
Abstract
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.</p> <p>Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.</p> <p>Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.
%0 Conference Paper
%1 Dean:2004:MSD:1251254.1251264
%A Dean, Jeffrey
%A Ghemawat, Sanjay
%B Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
%C Berkeley, CA, USA
%D 2004
%I USENIX Association
%K Distributed FaultTolerance Functional MapReduce Programming
%P 137--149
%T MapReduce: Simplified Data Processing on Large Clusters
%U http://static.usenix.org/event/osdi04/tech/full_papers/dean/dean.pdf
%X MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.</p> <p>Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.</p> <p>Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.
@inproceedings{Dean:2004:MSD:1251254.1251264,
abstract = {MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.</p> <p>Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.</p> <p>Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.},
acmid = {1251264},
added-at = {2012-07-05T15:26:24.000+0200},
address = {Berkeley, CA, USA},
author = {Dean, Jeffrey and Ghemawat, Sanjay},
biburl = {https://www.bibsonomy.org/bibtex/24c9b449cf04c48568fe470986a117e68/gron},
booktitle = {Proceedings of the 6th conference on Symposium on Opearting Systems Design \& Implementation - Volume 6},
description = {MapReduce},
interhash = {c853fc61c156362ffecdf9302fe7c33f},
intrahash = {4c9b449cf04c48568fe470986a117e68},
keywords = {Distributed FaultTolerance Functional MapReduce Programming},
location = {San Francisco, CA},
numpages = {1},
pages = {137--149},
publisher = {USENIX Association},
series = {OSDI'04},
timestamp = {2012-07-05T15:27:48.000+0200},
title = {MapReduce: Simplified Data Processing on Large Clusters},
url = {http://static.usenix.org/event/osdi04/tech/full_papers/dean/dean.pdf},
year = 2004
}