Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.
L. Gao, S. Kraemer, R. Leupers, G. Ascheid, and H. Meyr. CASES '07: Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, page 3--12. New York, NY, USA, ACM, (2007)
M. Galanis, G. Dimitroulakos, and C. Goutis. GLSVLSI '07: Proceedings of the 17th great lakes symposium on Great lakes symposium on VLSI, page 2--7. New York, NY, USA, ACM, (2007)