Inproceedings,

Replay debugging for distributed applications

, , , and .
Proceedings of the annual conference on USENIX '06 Annual Technical Conference, page 289--300. Berkeley, CA, USA, USENIX Association, (2006)

Abstract

We have developed a new replay debugging tool, liblog, for distributed C/C++ applications. It logs the execution of deployed application processes and replays them deterministically, faithfully reproducing race conditions and non-deterministic failures, enabling careful offline analysis.</p> <p>To our knowledge, liblog is the first replay tool to address the requirements of large distributed systems: lightweight support for long-running programs, consistent replay of arbitrary subsets of application nodes, and operation in a mixed environment of logging and nonlogging processes. In addition, it requires no special hardware or kernel patches, supports unmodified application executables, and integrates GDB into the replay mechanism for simultaneous source-level debugging of multiple processes.</p> <p>This paper presents liblog's design, an evaluation of its runtime overhead, and a discussion of our experience with the tool to date.

Tags

Users

  • @gron

Comments and Reviews