DMTCP: Transparent Checkpointing for Cluster Computations and the Desktop Authors: Jason Ansel, Kapil Arya, Gene Cooperman (Submitted on 6 Jan 2007 (v1), last revised 24 Feb 2009 (this version, v3)) Abstract: DMTCP (Distributed MultiThreaded CheckPointing) is a transparent user-level checkpointing package for distributed applications. Checkpointing and restart is demonstrated for a wide range of over 20 well known applications, including MATLAB, Python, TightVNC, MPICH2, OpenMPI, and runCMS. RunCMS runs as a 680 MB image in memory that includes 540 dynamic libraries, and is used for the CMS experiment of the Large Hadron Collider at CERN. DMTCP transparently checkpoints general cluster computations consisting of many nodes, processes, and threads