SmartFrog is a powerful and flexible Java-based software framework for configuring, deploying and managing distributed software systems.
SmartFrog helps you to encapsulate and manage systems so they are easy to configure and reconfigure, and so that that they can be automatically installed, started and shut down. It provides orchestration capabilities so that subsystems can be started (and stopped) in the right order. It also helps you to detect and recover from failures.
Such systems typically have multiple software components running across a network of computing resources, where the components must work together to deliver the functionality of the system as a whole. It's critical that the right components are running in the right places, that the components are individually and collectively correctly configured, and that they are correctly combined to create the complete system. This profile fits many of the services and applications that run on today's computing infrastructures.
SmartFrog consists of:
A Language for defining configurations, providing powerful system modelling capabilities and an expressive notation for describing system configurations
A secure, distributed Runtime System for deploying software components and managing running software systems
A Library of SmartFrog Components that implement the SmartFrog component model and provide a wide range of services and functionality
The goal of the Condor® Project is to develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing (HTC) on large collections of distributively owned computing resources. Guided by both the technological and sociological challenges of such a computing environment, the Condor Team has been building software tools that enable scientists and engineers to increase their computing throughput
Y. Lin, S. Han, H. Mao, Y. Wang, and W. Dally. (2017)cite arxiv:1712.01887Comment: we find 99.9% of the gradient exchange in distributed SGD is redundant; we reduce the communication bandwidth by two orders of magnitude without losing accuracy.
M. Becker, H. Mewes, A. Hotho, D. Dimitrov, F. Lemmerich, and M. Strohmaier. International Conference Companion on World Wide Web, page 17--18. Republic and Canton of Geneva, Switzerland, International World Wide Web Conferences Steering Committee, (2016)
S. Iacob, K. Nieuwenhuis, N. Wijngaards, G. Pavlin, and B. Veelen. Intelligent Distributed Computing III, Proceedings of the 3rd International
Symposium on Intelligent Distributed Computing - IDC 2009, volume 237 of Studies in Computational Intelligence, page 237--242. Springer, (October 2009)
A. Cheik Ahamed, and F. Magoulès. High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014 IEEE Intl Conf on, page 121-128. (August 2014)
L. Lai, C. Lai, A. Cheik Ahamed, and F. Magoules. High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014 IEEE Intl Conf on, page 137-144. (August 2014)
S. Urbanek. Workshop publication, 3rd International Workshop on Distributed Statistical Computing (DSC 2003), Vienna, Austria ISSN 1609-395X, (March 2003)
T. Beran, and T. Macek. Machine Learning and Data Mining in Pattern Recognition. First International Workshop, MLDM'99. Proceedings. (Lecture Notes in Artificial Intelligence Vol.1715), (1999)