Article,

Reconciling modern machine learning and the bias-variance trade-off

M. Belkin, D. Hsu, S. Ma, and S. Mandal.
(2018)cite arxiv:1812.11118.

Abstract

The question of generalization in machine learning---how algorithms are able to learn predictors from a training sample to make accurate predictions out-of-sample---is revisited in light of the recent breakthroughs in modern machine learning technology. The classical approach to understanding generalization is based on bias-variance trade-offs, where model complexity is carefully calibrated so that the fit on the training sample reflects performance out-of-sample. However, it is now common practice to fit highly complex models like deep neural networks to data with (nearly) zero training error, and yet these interpolating predictors are observed to have good out-of-sample accuracy even for noisy data. How can the classical understanding of generalization be reconciled with these observations from modern machine learning practice? In this paper, we bridge the two regimes by exhibiting a new "double descent" risk curve that extends the traditional U-shaped bias-variance curve beyond the point of interpolation. Specifically, the curve shows that as soon as the model complexity is high enough to achieve interpolation on the training sample---a point that we call the "interpolation threshold"---the risk of suitably chosen interpolating predictors from these models can, in fact, be decreasing as the model complexity increases, often below the risk achieved using non-interpolating models. The double descent risk curve is demonstrated for a broad range of models, including neural networks and random forests, and a mechanism for producing this behavior is posited.

BibTeX key: belkin2018reconciling
entry type: article
year: 2018
url: http://arxiv.org/abs/1812.11118
note: cite arxiv:1812.11118

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@article{belkin2018reconciling, abstract = {The question of generalization in machine learning---how algorithms are able to learn predictors from a training sample to make accurate predictions out-of-sample---is revisited in light of the recent breakthroughs in modern machine learning technology. The classical approach to understanding generalization is based on bias-variance trade-offs, where model complexity is carefully calibrated so that the fit on the training sample reflects performance out-of-sample. However, it is now common practice to fit highly complex models like deep neural networks to data with (nearly) zero training error, and yet these interpolating predictors are observed to have good out-of-sample accuracy even for noisy data. How can the classical understanding of generalization be reconciled with these observations from modern machine learning practice? In this paper, we bridge the two regimes by exhibiting a new "double descent" risk curve that extends the traditional U-shaped bias-variance curve beyond the point of interpolation. Specifically, the curve shows that as soon as the model complexity is high enough to achieve interpolation on the training sample---a point that we call the "interpolation threshold"---the risk of suitably chosen interpolating predictors from these models can, in fact, be decreasing as the model complexity increases, often below the risk achieved using non-interpolating models. The double descent risk curve is demonstrated for a broad range of models, including neural networks and random forests, and a mechanism for producing this behavior is posited.}, added-at = {2019-08-24T12:43:42.000+0200}, author = {Belkin, Mikhail and Hsu, Daniel and Ma, Siyuan and Mandal, Soumik}, biburl = {https://www.bibsonomy.org/bibtex/2bdcfeac0c7cc40387abd19bf07e3de39/kirk86}, description = {[1812.11118] Reconciling modern machine learning and the bias-variance trade-off}, interhash = {3b1f4c031bb88eff73ea2ede971e766a}, intrahash = {bdcfeac0c7cc40387abd19bf07e3de39}, keywords = {deep-learning machine-learning theory understan regularisation}, note = {cite arxiv:1812.11118}, timestamp = {2019-09-26T16:00:39.000+0200}, title = {Reconciling modern machine learning and the bias-variance trade-off}, url = {http://arxiv.org/abs/1812.11118}, year = 2018 }

BibSonomy

Reconciling modern machine learning and the bias-variance trade-off

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on