copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

On the duality of data-intensive file system design: reconciling HDFS and PVFS

W. Tantisiriroj, S. Son, S. Patil, S. Lang, G. Gibson, and R. Ross. Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, page 67:1--67:12. New York, NY, USA, ACM, (2011)
DOI: 10.1145/2063384.2063474

Abstract

Data-intensive applications fall into two computing styles: Internet services (cloud computing) or high-performance computing (HPC). In both categories, the underlying file system is a key component for scalable application performance. In this paper, we explore the similarities and differences between PVFS, a parallel file system used in HPC at large scale, and HDFS, the primary storage system used in cloud computing with Hadoop. We integrate PVFS into Hadoop and compare its performance to HDFS using a set of data-intensive computing benchmarks. We study how HDFS-specific optimizations can be matched using PVFS and how consistency, durability, and persistence tradeoffs made by these file systems affect application performance. We show how to embed multiple replicas into a PVFS file, including a mapping with a complete copy local to the writing client, to emulate HDFS's file layout policies. We also highlight implementation issues with HDFS's dependence on disk bandwidth and benefits from pipelined replication.

Description

On the duality of data-intensive file system design

Links and resources

BibTeX key: tantisiriroj2011duality
entry type: inproceedings
address: New York, NY, USA
booktitle: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
year: 2011
pages: 67:1--67:12
publisher: ACM
series: SC '11
location: Seattle, Washington
acmid: 2063474
isbn: 978-1-4503-0771-0
numpages: 12
articleno: 67
DOI: 10.1145/2063384.2063474
url: http://doi.acm.org/10.1145/2063384.2063474

@telekoma's tags highlighted

Cite this publication

@inproceedings{tantisiriroj2011duality, abstract = {Data-intensive applications fall into two computing styles: Internet services (cloud computing) or high-performance computing (HPC). In both categories, the underlying file system is a key component for scalable application performance. In this paper, we explore the similarities and differences between PVFS, a parallel file system used in HPC at large scale, and HDFS, the primary storage system used in cloud computing with Hadoop. We integrate PVFS into Hadoop and compare its performance to HDFS using a set of data-intensive computing benchmarks. We study how HDFS-specific optimizations can be matched using PVFS and how consistency, durability, and persistence tradeoffs made by these file systems affect application performance. We show how to embed multiple replicas into a PVFS file, including a mapping with a complete copy local to the writing client, to emulate HDFS's file layout policies. We also highlight implementation issues with HDFS's dependence on disk bandwidth and benefits from pipelined replication.}, acmid = {2063474}, added-at = {2012-10-30T16:39:40.000+0100}, address = {New York, NY, USA}, articleno = {67}, author = {Tantisiriroj, Wittawat and Son, Seung Woo and Patil, Swapnil and Lang, Samuel J. and Gibson, Garth and Ross, Robert B.}, biburl = {https://www.bibsonomy.org/bibtex/21102ab3cc19f7e1c4e025bd0af2a0842/telekoma}, booktitle = {Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis}, description = {On the duality of data-intensive file system design}, doi = {10.1145/2063384.2063474}, interhash = {9a174f889754f9bc295fa46b75eeb6e3}, intrahash = {1102ab3cc19f7e1c4e025bd0af2a0842}, isbn = {978-1-4503-0771-0}, keywords = {cloud distributed file hdfs hpc master performance pvfs seminar seminar:dfs system uni ws1213}, location = {Seattle, Washington}, numpages = {12}, pages = {67:1--67:12}, publisher = {ACM}, series = {SC '11}, timestamp = {2012-10-30T16:39:40.000+0100}, title = {On the duality of data-intensive file system design: reconciling HDFS and PVFS}, url = {http://doi.acm.org/10.1145/2063384.2063474}, year = 2011 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

On the duality of data-intensive file system design: reconciling HDFS and PVFS

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML On the duality of data-intensive file system design: reconciling HDFS and PVFS

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

On the duality of data-intensive file system design: reconciling HDFS and PVFS

Comments and Reviews
(0)