copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning

N. Frazier-Logue, and S. Hanson. (2018)cite arxiv:1808.03578Comment: 6 pages, 7 figures; submitted to ICML.

Abstract

Multi-layer neural networks have lead to remarkable performance on many kinds of benchmark tasks in text, speech and image processing. Nonlinear parameter estimation in hierarchical models is known to be subject to overfitting and misspecification. One approach to these estimation and related problems (local minima, colinearity, feature discovery etc.) is called Dropout (Hinton, et al 2012, Baldi et al 2016). The Dropout algorithm removes hidden units according to a Bernoulli random variable with probability $p$ prior to each update, creating random "shocks" to the network that are averaged over updates. In this paper we will show that Dropout is a special case of a more general model published originally in 1990 called the Stochastic Delta Rule, or SDR (Hanson, 1990). SDR redefines each weight in the network as a random variable with mean $\mu_w_ij$ and standard deviation $\sigma_w_ij$. Each weight random variable is sampled on each forward activation, consequently creating an exponential number of potential networks with shared weights. Both parameters are updated according to prediction error, thus resulting in weight noise injections that reflect a local history of prediction error and local model averaging. SDR therefore implements a more sensitive local gradient-dependent simulated annealing per weight converging in the limit to a Bayes optimal network. Tests on standard benchmarks (CIFAR) using a modified version of DenseNet shows the SDR outperforms standard Dropout in test error by approx. $17\%$ with DenseNet-BC 250 on CIFAR-100 and approx. $12-14\%$ in smaller networks. We also show that SDR reaches the same accuracy that Dropout attains in 100 epochs in as few as 35 epochs.

Description

Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning

Links and resources

BibTeX key

frazierlogue2018dropout

entry type

misc

year

2018

url

http://arxiv.org/abs/1808.03578

additional links

code

note

cite arxiv:1808.03578Comment: 6 pages, 7 figures; submitted to ICML

Cite this publication

@misc{frazierlogue2018dropout, abstract = {Multi-layer neural networks have lead to remarkable performance on many kinds of benchmark tasks in text, speech and image processing. Nonlinear parameter estimation in hierarchical models is known to be subject to overfitting and misspecification. One approach to these estimation and related problems (local minima, colinearity, feature discovery etc.) is called Dropout (Hinton, et al 2012, Baldi et al 2016). The Dropout algorithm removes hidden units according to a Bernoulli random variable with probability $p$ prior to each update, creating random "shocks" to the network that are averaged over updates. In this paper we will show that Dropout is a special case of a more general model published originally in 1990 called the Stochastic Delta Rule, or SDR (Hanson, 1990). SDR redefines each weight in the network as a random variable with mean $\mu_{w_{ij}}$ and standard deviation $\sigma_{w_{ij}}$. Each weight random variable is sampled on each forward activation, consequently creating an exponential number of potential networks with shared weights. Both parameters are updated according to prediction error, thus resulting in weight noise injections that reflect a local history of prediction error and local model averaging. SDR therefore implements a more sensitive local gradient-dependent simulated annealing per weight converging in the limit to a Bayes optimal network. Tests on standard benchmarks (CIFAR) using a modified version of DenseNet shows the SDR outperforms standard Dropout in test error by approx. $17\%$ with DenseNet-BC 250 on CIFAR-100 and approx. $12-14\%$ in smaller networks. We also show that SDR reaches the same accuracy that Dropout attains in 100 epochs in as few as 35 epochs.}, added-at = {2019-02-11T11:10:32.000+0100}, author = {Frazier-Logue, Noah and Hanson, Stephen José}, biburl = {https://www.bibsonomy.org/bibtex/200853901983ad41bc7e5e371e0c39644/bechr7}, description = {Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning}, interhash = {74431432c91b350a930a384c2aeaab0a}, intrahash = {00853901983ad41bc7e5e371e0c39644}, keywords = {dl dropout}, note = {cite arxiv:1808.03578Comment: 6 pages, 7 figures; submitted to ICML}, timestamp = {2019-02-11T11:10:32.000+0100}, title = {Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning}, url = {http://arxiv.org/abs/1808.03578}, year = 2018 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning

Comments and Reviews
(0)