A. Achille, and S. Soatto. (2017)cite arxiv:1706.01350Comment: Deep learning, neural network, representation, flat minima, information bottleneck, overfitting, generalization, sufficiency, minimality, sensitivity, information complexity, stochastic gradient descent, regularization, total correlation, PAC-Bayes.
M. Aldridge, O. Johnson, and J. Scarlett. (2019)cite arxiv:1902.06002Comment: Survey paper, 140 pages, 19 figures. To be published in Foundations and Trends in Communications and Information Theory.