Topology and Geometry of Half-Rectified Network Optimization
C. Freeman, and J. Bruna. (2016)cite arxiv:1611.01540Comment: 22 Pages (10 main + Appendices), 4 Figures, 1 Table, Published as a conference paper at ICLR 2017.
Abstract
The loss surface of deep neural networks has recently attracted interest in
the optimization and machine learning communities as a prime example of
high-dimensional non-convex problem. Some insights were recently gained using
spin glass models and mean-field approximations, but at the expense of strongly
simplifying the nonlinear nature of the model.
In this work, we do not make any such assumption and study conditions on the
data distribution and model architecture that prevent the existence of bad
local minima. Our theoretical work quantifies and formalizes two important
folklore facts: (i) the landscape of deep linear networks has a
radically different topology from that of deep half-rectified ones, and (ii)
that the energy landscape in the non-linear case is fundamentally controlled by
the interplay between the smoothness of the data distribution and model
over-parametrization. Our main theoretical contribution is to prove that
half-rectified single layer networks are asymptotically connected, and we
provide explicit bounds that reveal the aforementioned interplay.
The conditioning of gradient descent is the next challenge we address. We
study this question through the geometry of the level sets, and we introduce an
algorithm to efficiently estimate the regularity of such sets on large-scale
networks. Our empirical results show that these level sets remain connected
throughout all the learning phase, suggesting a near convex behavior, but they
become exponentially more curvy as the energy level decays, in accordance to
what is observed in practice with very low curvature attractors.
Description
[1611.01540] Topology and Geometry of Half-Rectified Network Optimization
%0 Journal Article
%1 freeman2016topology
%A Freeman, C. Daniel
%A Bruna, Joan
%D 2016
%K approximate generalization optimization topology
%T Topology and Geometry of Half-Rectified Network Optimization
%U http://arxiv.org/abs/1611.01540
%X The loss surface of deep neural networks has recently attracted interest in
the optimization and machine learning communities as a prime example of
high-dimensional non-convex problem. Some insights were recently gained using
spin glass models and mean-field approximations, but at the expense of strongly
simplifying the nonlinear nature of the model.
In this work, we do not make any such assumption and study conditions on the
data distribution and model architecture that prevent the existence of bad
local minima. Our theoretical work quantifies and formalizes two important
folklore facts: (i) the landscape of deep linear networks has a
radically different topology from that of deep half-rectified ones, and (ii)
that the energy landscape in the non-linear case is fundamentally controlled by
the interplay between the smoothness of the data distribution and model
over-parametrization. Our main theoretical contribution is to prove that
half-rectified single layer networks are asymptotically connected, and we
provide explicit bounds that reveal the aforementioned interplay.
The conditioning of gradient descent is the next challenge we address. We
study this question through the geometry of the level sets, and we introduce an
algorithm to efficiently estimate the regularity of such sets on large-scale
networks. Our empirical results show that these level sets remain connected
throughout all the learning phase, suggesting a near convex behavior, but they
become exponentially more curvy as the energy level decays, in accordance to
what is observed in practice with very low curvature attractors.
@article{freeman2016topology,
abstract = {The loss surface of deep neural networks has recently attracted interest in
the optimization and machine learning communities as a prime example of
high-dimensional non-convex problem. Some insights were recently gained using
spin glass models and mean-field approximations, but at the expense of strongly
simplifying the nonlinear nature of the model.
In this work, we do not make any such assumption and study conditions on the
data distribution and model architecture that prevent the existence of bad
local minima. Our theoretical work quantifies and formalizes two important
\emph{folklore} facts: (i) the landscape of deep linear networks has a
radically different topology from that of deep half-rectified ones, and (ii)
that the energy landscape in the non-linear case is fundamentally controlled by
the interplay between the smoothness of the data distribution and model
over-parametrization. Our main theoretical contribution is to prove that
half-rectified single layer networks are asymptotically connected, and we
provide explicit bounds that reveal the aforementioned interplay.
The conditioning of gradient descent is the next challenge we address. We
study this question through the geometry of the level sets, and we introduce an
algorithm to efficiently estimate the regularity of such sets on large-scale
networks. Our empirical results show that these level sets remain connected
throughout all the learning phase, suggesting a near convex behavior, but they
become exponentially more curvy as the energy level decays, in accordance to
what is observed in practice with very low curvature attractors.},
added-at = {2019-11-01T15:37:07.000+0100},
author = {Freeman, C. Daniel and Bruna, Joan},
biburl = {https://www.bibsonomy.org/bibtex/2bbac3c40bd45b50e5ce54d0c533cbc9e/kirk86},
description = {[1611.01540] Topology and Geometry of Half-Rectified Network Optimization},
interhash = {b0c0bb6b09805bc30c22dc3504cadbbb},
intrahash = {bbac3c40bd45b50e5ce54d0c533cbc9e},
keywords = {approximate generalization optimization topology},
note = {cite arxiv:1611.01540Comment: 22 Pages (10 main + Appendices), 4 Figures, 1 Table, Published as a conference paper at ICLR 2017},
timestamp = {2019-11-01T15:37:07.000+0100},
title = {Topology and Geometry of Half-Rectified Network Optimization},
url = {http://arxiv.org/abs/1611.01540},
year = 2016
}