Misc,

Emergent properties of the local geometry of neural loss landscapes

S. Fort, and S. Ganguli.
(2019)cite arxiv:1910.05929Comment: 10 pages, 8 figures.

Abstract

The local geometry of high dimensional neural network loss landscapes can both challenge our cherished theoretical intuitions as well as dramatically impact the practical success of neural network training. Indeed recent works have observed 4 striking local properties of neural loss landscapes on classification tasks: (1) the landscape exhibits exactly $C$ directions of high positive curvature, where $C$ is the number of classes; (2) gradient directions are largely confined to this extremely low dimensional subspace of positive Hessian curvature, leaving the vast majority of directions in weight space unexplored; (3) gradient descent transiently explores intermediate regions of higher positive curvature before eventually finding flatter minima; (4) training can be successful even when confined to low dimensional random affine hyperplanes, as long as these hyperplanes intersect a Goldilocks zone of higher than average curvature. We develop a simple theoretical model of gradients and Hessians, justified by numerical experiments on architectures and datasets used in practice, that simultaneously accounts for all $4$ of these surprising and seemingly unrelated properties. Our unified model provides conceptual insights into the emergence of these properties and makes connections with diverse topics in neural networks, random matrix theory, and spin glasses, including the neural tangent kernel, BBP phase transitions, and Derrida's random energy model.

BibTeX key: fort2019emergent
entry type: misc
year: 2019
url: http://arxiv.org/abs/1910.05929
note: cite arxiv:1910.05929Comment: 10 pages, 8 figures

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@misc{fort2019emergent, abstract = {The local geometry of high dimensional neural network loss landscapes can both challenge our cherished theoretical intuitions as well as dramatically impact the practical success of neural network training. Indeed recent works have observed 4 striking local properties of neural loss landscapes on classification tasks: (1) the landscape exhibits exactly $C$ directions of high positive curvature, where $C$ is the number of classes; (2) gradient directions are largely confined to this extremely low dimensional subspace of positive Hessian curvature, leaving the vast majority of directions in weight space unexplored; (3) gradient descent transiently explores intermediate regions of higher positive curvature before eventually finding flatter minima; (4) training can be successful even when confined to low dimensional {\it random} affine hyperplanes, as long as these hyperplanes intersect a Goldilocks zone of higher than average curvature. We develop a simple theoretical model of gradients and Hessians, justified by numerical experiments on architectures and datasets used in practice, that {\it simultaneously} accounts for all $4$ of these surprising and seemingly unrelated properties. Our unified model provides conceptual insights into the emergence of these properties and makes connections with diverse topics in neural networks, random matrix theory, and spin glasses, including the neural tangent kernel, BBP phase transitions, and Derrida's random energy model.}, added-at = {2019-10-17T09:54:07.000+0200}, author = {Fort, Stanislav and Ganguli, Surya}, biburl = {https://www.bibsonomy.org/bibtex/20fcf77f0b0030c63f66be0ab789e931c/topel}, interhash = {17af2697161ed9fe605202ce0d01d6e2}, intrahash = {0fcf77f0b0030c63f66be0ab789e931c}, keywords = {deeplearning geometry}, note = {cite arxiv:1910.05929Comment: 10 pages, 8 figures}, timestamp = {2019-10-17T09:54:07.000+0200}, title = {Emergent properties of the local geometry of neural loss landscapes}, url = {http://arxiv.org/abs/1910.05929}, year = 2019 }

BibSonomy

Emergent properties of the local geometry of neural loss landscapes

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on