Y. Lin, S. Han, H. Mao, Y. Wang, and W. Dally. (2017)cite arxiv:1712.01887Comment: we find 99.9% of the gradient exchange in distributed SGD is redundant; we reduce the communication bandwidth by two orders of magnitude without losing accuracy.
Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. Cottrell. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence Organization, (August 2017)