copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions

G. Philipp, D. Song, and J. Carbonell. (2017)cite arxiv:1712.05577Comment: An earlier version of this paper was named "Gradients explode - Deep Networks are shallow - ResNet explained" and presented at the ICLR 2018 workshop (https://openreview.net/forum?id=rJjcdFkPM).

Abstract

Whereas it is believed that techniques such as Adam, batch normalization and, more recently, SeLU nonlinearities "solve" the exploding gradient problem, we show that this is not the case in general and that in a range of popular MLP architectures, exploding gradients exist and that they limit the depth to which networks can be effectively trained, both in theory and in practice. We explain why exploding gradients occur and highlight the *collapsing domain problem*, which can arise in architectures that avoid exploding gradients. ResNets have significantly lower gradients and thus can circumvent the exploding gradient problem, enabling the effective training of much deeper networks. We show this is a direct consequence of the Pythagorean equation. By noticing that *any neural network is a residual network*, we devise the *residual trick*, which reveals that introducing skip connections simplifies the network mathematically, and that this simplicity may be the major cause for their success.

Description

The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions

Links and resources

BibTeX key: philipp2017exploding
entry type: misc
year: 2017
url: http://arxiv.org/abs/1712.05577
note: cite arxiv:1712.05577Comment: An earlier version of this paper was named "Gradients explode - Deep Networks are shallow - ResNet explained" and presented at the ICLR 2018 workshop (https://openreview.net/forum?id=rJjcdFkPM)

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions

Comments and Reviews
(0)