An integrated approach to recovery and high availability in an updatable, distributed data warehouse
E. Lau, and S. Madden. VLDB '06: Proceedings of the 32nd international conference on Very large data bases, page 703--714. VLDB Endowment, (2006)
Abstract
Any highly available data warehouse will use some form of data replication to tolerate machine failures. In this paper, we demonstrate that we can leverage this data redundancy to build an integrated approach to recovery and high availability. Our approach, called HARBOR, revives a crashed site by querying remote, online sites for missing updates and uses timestamps to determine which tuples need to be copied or updated. HARBOR does not require a stable log, recovers without quiescing the system, allows replicated data to be stored non-identically, and is simpler than a log-based recovery algorithm.We compare the runtime overhead and recovery performance of HARBOR to those of two-phase commit and ARIES, the gold standard for log-based recovery, on a three-node distributed database system. Our experiments demonstrate that HARBOR suffers lower runtime overhead, has recovery performance comparable to ARIES's, and can tolerate the fault of a worker and efficiently bring it back online.
Description
An integrated approach to recovery and high availability in an updatable, distributed data warehouse
%0 Conference Paper
%1 1164188
%A Lau, Edmond
%A Madden, Samuel
%B VLDB '06: Proceedings of the 32nd international conference on Very large data bases
%D 2006
%I VLDB Endowment
%K database distributed high_availability parallel recovery replication vertica vldb warehouse
%P 703--714
%T An integrated approach to recovery and high availability in an updatable, distributed data warehouse
%U http://portal.acm.org/citation.cfm?id=1164127.1164188
%X Any highly available data warehouse will use some form of data replication to tolerate machine failures. In this paper, we demonstrate that we can leverage this data redundancy to build an integrated approach to recovery and high availability. Our approach, called HARBOR, revives a crashed site by querying remote, online sites for missing updates and uses timestamps to determine which tuples need to be copied or updated. HARBOR does not require a stable log, recovers without quiescing the system, allows replicated data to be stored non-identically, and is simpler than a log-based recovery algorithm.We compare the runtime overhead and recovery performance of HARBOR to those of two-phase commit and ARIES, the gold standard for log-based recovery, on a three-node distributed database system. Our experiments demonstrate that HARBOR suffers lower runtime overhead, has recovery performance comparable to ARIES's, and can tolerate the fault of a worker and efficiently bring it back online.
@inproceedings{1164188,
abstract = {Any highly available data warehouse will use some form of data replication to tolerate machine failures. In this paper, we demonstrate that we can leverage this data redundancy to build an integrated approach to recovery and high availability. Our approach, called HARBOR, revives a crashed site by querying remote, online sites for missing updates and uses timestamps to determine which tuples need to be copied or updated. HARBOR does not require a stable log, recovers without quiescing the system, allows replicated data to be stored non-identically, and is simpler than a log-based recovery algorithm.We compare the runtime overhead and recovery performance of HARBOR to those of two-phase commit and ARIES, the gold standard for log-based recovery, on a three-node distributed database system. Our experiments demonstrate that HARBOR suffers lower runtime overhead, has recovery performance comparable to ARIES's, and can tolerate the fault of a worker and efficiently bring it back online.},
added-at = {2007-12-06T05:00:33.000+0100},
author = {Lau, Edmond and Madden, Samuel},
biburl = {https://www.bibsonomy.org/bibtex/23747a2f87cf3d378046a8564159b83a5/jhammerb},
booktitle = {VLDB '06: Proceedings of the 32nd international conference on Very large data bases},
description = {An integrated approach to recovery and high availability in an updatable, distributed data warehouse},
interhash = {eaf1fcbbd3e65f853a186bb1df063f71},
intrahash = {3747a2f87cf3d378046a8564159b83a5},
keywords = {database distributed high_availability parallel recovery replication vertica vldb warehouse},
location = {Seoul, Korea},
pages = {703--714},
publisher = {VLDB Endowment},
timestamp = {2007-12-06T05:00:34.000+0100},
title = {An integrated approach to recovery and high availability in an updatable, distributed data warehouse},
url = {http://portal.acm.org/citation.cfm?id=1164127.1164188},
year = 2006
}