copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

Z. Yang, Y. Yu, C. You, J. Steinhardt, and Y. Ma. Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, page 10767--10777. PMLR, (13--18 Jul 2020)

Abstract

The classical bias-variance trade-off predicts that bias decreases and variance increase with model complexity, leading to a U-shaped risk curve. Recent work calls this into question for neural networks and other over-parameterized models, for which it is often observed that larger models generalize better. We provide a simple explanation of this by measuring the bias and variance of neural networks: while the bias is monotonically decreasing as in the classical theory, the variance is unimodal or bell-shaped: it increases then decreases with the width of the network. We vary the network architecture, loss function, and choice of dataset and confirm that variance unimodality occurs robustly for all models we considered. The risk curve is the sum of the bias and variance curves and displays different qualitative shapes depending on the relative scale of bias and variance, with the double descent in the recent literature as a special case. We corroborate these empirical results with a theoretical analysis of two-layer linear networks with random first layer. Finally, evaluation on out-of-distribution data shows that most of the drop in accuracy comes from increased bias while variance increases by a relatively small amount. Moreover, we find that deeper models decrease bias and increase variance for both in-distribution and out-of-distribution data.

Links and resources

BibTeX key: 2020-yang
entry type: inproceedings
booktitle: Proceedings of the 37th International Conference on Machine Learning
year: 2020
month: 13--18 Jul
pages: 10767--10777
publisher: PMLR
series: Proceedings of Machine Learning Research
volume: 119
pdf: http://proceedings.mlr.press/v119/yang20j/yang20j.pdf
Document: http://proceedings.mlr.press/v119/yang20j.html

@pkoch's tags highlighted

Cite this publication

@inproceedings{2020-yang, abstract = {The classical bias-variance trade-off predicts that bias decreases and variance increase with model complexity, leading to a U-shaped risk curve. Recent work calls this into question for neural networks and other over-parameterized models, for which it is often observed that larger models generalize better. We provide a simple explanation of this by measuring the bias and variance of neural networks: while the bias is \emph{monotonically decreasing} as in the classical theory, the variance is \emph{unimodal} or bell-shaped: it increases then decreases with the width of the network. We vary the network architecture, loss function, and choice of dataset and confirm that variance unimodality occurs robustly for all models we considered. The risk curve is the sum of the bias and variance curves and displays different qualitative shapes depending on the relative scale of bias and variance, with the double descent in the recent literature as a special case. We corroborate these empirical results with a theoretical analysis of two-layer linear networks with random first layer. Finally, evaluation on out-of-distribution data shows that most of the drop in accuracy comes from increased bias while variance increases by a relatively small amount. Moreover, we find that deeper models decrease bias and increase variance for both in-distribution and out-of-distribution data.}, added-at = {2021-07-09T12:36:11.000+0200}, author = {Yang, Zitong and Yu, Yaodong and You, Chong and Steinhardt, Jacob and Ma, Yi}, biburl = {https://www.bibsonomy.org/bibtex/283d5a3f2553c6c7603419cd3cab996a9/pkoch}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, editor = {III, Hal Daumé and Singh, Aarti}, interhash = {84280a2e4c558612d0bbd98146d75651}, intrahash = {83d5a3f2553c6c7603419cd3cab996a9}, keywords = {bias generalizability overparametrization parameters trade-off variance}, month = {13--18 Jul}, pages = {10767--10777}, pdf = {http://proceedings.mlr.press/v119/yang20j/yang20j.pdf}, publisher = {PMLR}, series = {Proceedings of Machine Learning Research}, timestamp = {2021-07-09T12:37:03.000+0200}, title = {Rethinking Bias-Variance Trade-off for Generalization of Neural Networks}, url = {http://proceedings.mlr.press/v119/yang20j.html}, volume = 119, year = 2020 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

Comments and Reviews
(0)