S. Harizopoulos, V. Liang, D. Abadi, und S. Madden. VLDB '06: Proceedings of the 32nd international conference on Very large data bases, Seite 487--498. VLDB Endowment, (2006)
Zusammenfassung
Database systems have traditionally optimized performance for write-intensive workloads. Recently, there has been renewed interest in architectures that optimize read performance by using column-oriented data representation and light-weight compression. This previous work has shown that under certain broad classes of workloads, column-based systems can outperform row-based systems. Previous work, however, has not characterized the precise conditions under which a particular query workload can be expected to perform better on a column-oriented database.In this paper we first identify the distinctive components of a read-optimized DBMS and describe our implementation of a high-performance query engine that can operate on both row and column-oriented data. We then use our prototype to perform an in-depth analysis of the tradeoffs between column and row-oriented architectures. We explore these tradeoffs in terms of disk bandwidth, CPU cache latency, and CPU cycles. We show that for most database workloads, a carefully designed column system can outperform a carefully designed row system, sometimes by an order of magnitude. We also present an analytical model to predict whether a given workload on a particular hardware configuration is likely to perform better on a row or column-based system.
%0 Conference Paper
%1 1164170
%A Harizopoulos, Stavros
%A Liang, Velen
%A Abadi, Daniel J.
%A Madden, Samuel
%B VLDB '06: Proceedings of the 32nd international conference on Very large data bases
%D 2006
%I VLDB Endowment
%K column database vertica warehouse
%P 487--498
%T Performance tradeoffs in read-optimized databases
%U http://portal.acm.org/citation.cfm?id=1164127.1164170
%X Database systems have traditionally optimized performance for write-intensive workloads. Recently, there has been renewed interest in architectures that optimize read performance by using column-oriented data representation and light-weight compression. This previous work has shown that under certain broad classes of workloads, column-based systems can outperform row-based systems. Previous work, however, has not characterized the precise conditions under which a particular query workload can be expected to perform better on a column-oriented database.In this paper we first identify the distinctive components of a read-optimized DBMS and describe our implementation of a high-performance query engine that can operate on both row and column-oriented data. We then use our prototype to perform an in-depth analysis of the tradeoffs between column and row-oriented architectures. We explore these tradeoffs in terms of disk bandwidth, CPU cache latency, and CPU cycles. We show that for most database workloads, a carefully designed column system can outperform a carefully designed row system, sometimes by an order of magnitude. We also present an analytical model to predict whether a given workload on a particular hardware configuration is likely to perform better on a row or column-based system.
@inproceedings{1164170,
abstract = {Database systems have traditionally optimized performance for write-intensive workloads. Recently, there has been renewed interest in architectures that optimize read performance by using column-oriented data representation and light-weight compression. This previous work has shown that under certain broad classes of workloads, column-based systems can outperform row-based systems. Previous work, however, has not characterized the precise conditions under which a particular query workload can be expected to perform better on a column-oriented database.In this paper we first identify the distinctive components of a read-optimized DBMS and describe our implementation of a high-performance query engine that can operate on both row and column-oriented data. We then use our prototype to perform an in-depth analysis of the tradeoffs between column and row-oriented architectures. We explore these tradeoffs in terms of disk bandwidth, CPU cache latency, and CPU cycles. We show that for most database workloads, a carefully designed column system can outperform a carefully designed row system, sometimes by an order of magnitude. We also present an analytical model to predict whether a given workload on a particular hardware configuration is likely to perform better on a row or column-based system.},
added-at = {2007-12-06T04:23:15.000+0100},
author = {Harizopoulos, Stavros and Liang, Velen and Abadi, Daniel J. and Madden, Samuel},
biburl = {https://www.bibsonomy.org/bibtex/2a03c718c615e4e0716eb31c5a482cb87/jhammerb},
booktitle = {VLDB '06: Proceedings of the 32nd international conference on Very large data bases},
description = {Performance tradeoffs in read-optimized databases},
interhash = {234f2ea3bf1d9e115f437919e646563c},
intrahash = {a03c718c615e4e0716eb31c5a482cb87},
keywords = {column database vertica warehouse},
location = {Seoul, Korea},
pages = {487--498},
publisher = {VLDB Endowment},
timestamp = {2007-12-06T04:23:15.000+0100},
title = {Performance tradeoffs in read-optimized databases},
url = {http://portal.acm.org/citation.cfm?id=1164127.1164170},
year = 2006
}