D. Soudry, E. Hoffer, M. Nacson, S. Gunasekar, and N. Srebro. (2017)cite arxiv:1710.10345Comment: Journal version (previous version appeared as conference paper in ICLR ). Main improvements: We proved measure zero case for main theorem (with implication for the rates), and the multi-class case. Both were not covered in previous version.
G. Philipp, D. Song, and J. Carbonell. (2017)cite arxiv:1712.05577Comment: An earlier version of this paper was named "Gradients explode - Deep Networks are shallow - ResNet explained" and presented at the ICLR 2018 workshop (https://openreview.net/forum?id=rJjcdFkPM).
R. Kidambi, P. Netrapalli, P. Jain, and S. Kakade. (2018)cite arxiv:1803.05591Comment: 28 pages, 10 figures. Appears as an oral presentation at International Conference on Learning Representations (ICLR), 2018. Code implementing the ASGD method can be found at https://github.com/rahulkidambi/AccSGD.