Y. Lin, S. Han, H. Mao, Y. Wang, und W. Dally. (2017)cite arxiv:1712.01887Comment: we find 99.9% of the gradient exchange in distributed SGD is redundant; we reduce the communication bandwidth by two orders of magnitude without losing accuracy.
J. Behrmann, W. Grathwohl, R. Chen, D. Duvenaud, und J. Jacobsen. Proceedings of the 36th International Conference on Machine Learning, Volume 97 von Proceedings of Machine Learning Research, Seite 573--582. Long Beach, California, USA, PMLR, (09--15 Jun 2019)
Q. Wang, L. Huang, Z. Jiang, K. Knight, H. Ji, M. Bansal, und Y. Luan. (2019)cite arxiv:1905.07870Comment: 12 pages. Accepted by ACL 2019 Code and resource will be available at https://github.com/EagleW/PaperRobot.