Artikel,

The general inefficiency of batch training for gradient descent learning

D. Wilson, und T. Martinez.
Neural Networks, 16 (10): 1429 - 1451 (2003)
DOI: http://dx.doi.org/10.1016/S0893-6080(03)00138-2

Zusammenfassung

Gradient descent training of neural networks can be done in either a batch or on-line manner. A widely held myth in the neural network community is that batch training is as fast or faster and/or more ‘correct’ than on-line training because it supposedly uses a better approximation of the true gradient for its weight updates. This paper explains why batch training is almost always slower than on-line training—often orders of magnitude slower—especially on large training sets. The main reason is due to the ability of on-line training to follow curves in the error surface throughout each epoch, which allows it to safely use a larger learning rate and thus converge with less iterations through the training data. Empirical results on a large (20,000-instance) speech recognition task and on 26 other learning tasks demonstrate that convergence can be reached significantly faster using on-line training than batch training, with no apparent difference in accuracy.

BibTeX-Schlüssel: wilson2003general
Eintragstyp: article
Jahr: 2003
Zeitschrift: Neural Networks
Nummer: 10
Seiten: 1429 - 1451
Band: 16
issn: 0893-6080
DOI: http://dx.doi.org/10.1016/S0893-6080(03)00138-2
URL: http://www.sciencedirect.com/science/article/pii/S0893608003001382

BibSonomy

The general inefficiency of batch training for gradient descent learning

Zusammenfassung

Tags

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf