A. Achille, and S. Soatto. (2017)cite arxiv:1706.01350Comment: Deep learning, neural network, representation, flat minima, information bottleneck, overfitting, generalization, sufficiency, minimality, sensitivity, information complexity, stochastic gradient descent, regularization, total correlation, PAC-Bayes.
C. Wei, J. Lee, Q. Liu, and T. Ma. (2018)cite arxiv:1810.05369Comment: version 2: title changed from originally Ön the Margin Theory of Feedforward Neural Networks". Substantial changes from old version of paper, including a new lower bound on NTK sample complexity version 3: reorganized NTK lower bound proof.