D. Soudry, E. Hoffer, M. Nacson, S. Gunasekar, and N. Srebro. (2017)cite arxiv:1710.10345Comment: Journal version (previous version appeared as conference paper in ICLR ). Main improvements: We proved measure zero case for main theorem (with implication for the rates), and the multi-class case. Both were not covered in previous version.
R. Kidambi, P. Netrapalli, P. Jain, and S. Kakade. (2018)cite arxiv:1803.05591Comment: 28 pages, 10 figures. Appears as an oral presentation at International Conference on Learning Representations (ICLR), 2018. Code implementing the ASGD method can be found at https://github.com/rahulkidambi/AccSGD.