L. Dinh, R. Pascanu, S. Bengio, und Y. Bengio. (2017)cite arxiv:1703.04933Comment: 8.5 pages of main content, 2.5 of bibliography and 1 page of appendix.
Y. Lin, S. Han, H. Mao, Y. Wang, und W. Dally. (2017)cite arxiv:1712.01887Comment: we find 99.9% of the gradient exchange in distributed SGD is redundant; we reduce the communication bandwidth by two orders of magnitude without losing accuracy.