copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

A comparison of approaches to large-scale data analysis

A. Pavlo, E. Paulson, A. Rasin, D. Abadi, D. DeWitt, S. Madden, and M. Stonebraker. Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, page 165--178. New York, NY, USA, ACM, (2009)
DOI: 10.1145/1559845.1559865

Abstract

There is currently considerable enthusiasm around the MapReduce (MR) paradigm for large-scale data analysis 17. Although the basic control flow of this framework has existed in parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing model 8, 17. In this paper, we describe and compare both paradigms. Furthermore, we evaluate both kinds of systems in terms of performance and development complexity. To this end, we define a benchmark consisting of a collection of tasks that we have run on an open source version of MR as well as on two parallel DBMSs. For each task, we measure each system's performance for various degrees of parallelism on a cluster of 100 nodes. Our results reveal some interesting trade-offs. Although the process to load data into and tune the execution of parallel DBMSs took much longer than the MR system, the observed performance of these DBMSs was strikingly better. We speculate about the causes of the dramatic performance difference and consider implementation concepts that future systems should take from both kinds of architectures.

Description

A comparison of approaches to large-scale data analysis

Links and resources

BibTeX key: pavlo09largescaledata
entry type: inproceedings
address: New York, NY, USA
booktitle: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
year: 2009
pages: 165--178
publisher: ACM
series: SIGMOD '09
location: Providence, Rhode Island, USA
acmid: 1559865
isbn: 978-1-60558-551-2
numpages: 14
DOI: 10.1145/1559845.1559865
url: http://doi.acm.org/10.1145/1559845.1559865

@sb3000's tags highlighted

Cite this publication

@inproceedings{pavlo09largescaledata, abstract = {There is currently considerable enthusiasm around the MapReduce (MR) paradigm for large-scale data analysis [17]. Although the basic control flow of this framework has existed in parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing model [8, 17]. In this paper, we describe and compare both paradigms. Furthermore, we evaluate both kinds of systems in terms of performance and development complexity. To this end, we define a benchmark consisting of a collection of tasks that we have run on an open source version of MR as well as on two parallel DBMSs. For each task, we measure each system's performance for various degrees of parallelism on a cluster of 100 nodes. Our results reveal some interesting trade-offs. Although the process to load data into and tune the execution of parallel DBMSs took much longer than the MR system, the observed performance of these DBMSs was strikingly better. We speculate about the causes of the dramatic performance difference and consider implementation concepts that future systems should take from both kinds of architectures.}, acmid = {1559865}, added-at = {2013-06-10T07:19:54.000+0200}, address = {New York, NY, USA}, author = {Pavlo, Andrew and Paulson, Erik and Rasin, Alexander and Abadi, Daniel J. and DeWitt, David J. and Madden, Samuel and Stonebraker, Michael}, biburl = {https://www.bibsonomy.org/bibtex/280f679a6d3e0972899b2543c945e8dab/sb3000}, booktitle = {Proceedings of the 2009 ACM SIGMOD International Conference on Management of data}, description = {A comparison of approaches to large-scale data analysis}, doi = {10.1145/1559845.1559865}, interhash = {9da048cc41164d9f902fe0d897890608}, intrahash = {80f679a6d3e0972899b2543c945e8dab}, isbn = {978-1-60558-551-2}, keywords = {bigdata database dbms mapreduce}, location = {Providence, Rhode Island, USA}, numpages = {14}, pages = {165--178}, publisher = {ACM}, series = {SIGMOD '09}, timestamp = {2013-06-10T07:19:54.000+0200}, title = {A comparison of approaches to large-scale data analysis}, url = {http://doi.acm.org/10.1145/1559845.1559865}, year = 2009 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

A comparison of approaches to large-scale data analysis

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML A comparison of approaches to large-scale data analysis

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

A comparison of approaches to large-scale data analysis

Comments and Reviews
(0)