copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Understanding the difficulty of training deep feedforward neural networks

X. Glorot, and Y. Bengio. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research, page 249--256. Chia Laguna Resort, Sardinia, Italy, PMLR, (13--15 May 2010)

Abstract

Whereas before 2006 it appears that deep multi-layer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with experimental results showing the superiority of deeper vs less deep architectures. All these experimental results were obtained with new initialization or training mechanisms. Our objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. We first observe the influence of the non-linear activations functions. We find that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation. Surprisingly, we find that saturated units can move out of saturation by themselves, albeit slowly, and explaining the plateaus sometimes seen when training neural networks. We find that a new non-linearity that saturates less can often be beneficial. Finally, we study how activations and gradients vary across layers and during training, with the idea that training may be more difficult when the singular values of the Jacobian associated with each layer are far from 1. Based on these considerations, we propose a new initialization scheme that brings substantially faster convergence.

Description

Understanding the difficulty of training deep feedforward neural networks

Links and resources

BibTeX key: pmlr-v9-glorot10a
entry type: inproceedings
address: Chia Laguna Resort, Sardinia, Italy
booktitle: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
year: 2010
month: 13--15 May
pages: 249--256
publisher: PMLR
series: Proceedings of Machine Learning Research
volume: 9
pdf: http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf
Document: http://proceedings.mlr.press/v9/glorot10a.html

@spdrnl's tags highlighted

Cite this publication

@inproceedings{pmlr-v9-glorot10a, abstract = {Whereas before 2006 it appears that deep multi-layer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with experimental results showing the superiority of deeper vs less deep architectures. All these experimental results were obtained with new initialization or training mechanisms. Our objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. We first observe the influence of the non-linear activations functions. We find that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation. Surprisingly, we find that saturated units can move out of saturation by themselves, albeit slowly, and explaining the plateaus sometimes seen when training neural networks. We find that a new non-linearity that saturates less can often be beneficial. Finally, we study how activations and gradients vary across layers and during training, with the idea that training may be more difficult when the singular values of the Jacobian associated with each layer are far from 1. Based on these considerations, we propose a new initialization scheme that brings substantially faster convergence.}, added-at = {2017-12-09T13:35:34.000+0100}, address = {Chia Laguna Resort, Sardinia, Italy}, author = {Glorot, Xavier and Bengio, Yoshua}, biburl = {https://www.bibsonomy.org/bibtex/2c0a4691d82570be264c5f392d8b71496/spdrnl}, booktitle = {Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics}, description = {Understanding the difficulty of training deep feedforward neural networks}, editor = {Teh, Yee Whye and Titterington, Mike}, interhash = {4f45a520bb65b6045bd237963ffee0ed}, intrahash = {c0a4691d82570be264c5f392d8b71496}, keywords = {deep tech}, month = {13--15 May}, pages = {249--256}, pdf = {http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf}, publisher = {PMLR}, series = {Proceedings of Machine Learning Research}, timestamp = {2017-12-09T13:35:34.000+0100}, title = {Understanding the difficulty of training deep feedforward neural networks}, url = {http://proceedings.mlr.press/v9/glorot10a.html}, volume = 9, year = 2010 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Understanding the difficulty of training deep feedforward neural networks

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Understanding the difficulty of training deep feedforward neural networks

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Understanding the difficulty of training deep feedforward neural networks

Comments and Reviews
(0)