Article,

Shape Matters: Understanding the Implicit Bias of the Noise Covariance

J. HaoChen, C. Wei, J. Lee, and T. Ma.
(2020)cite arxiv:2006.08680.

Abstract

The noise in stochastic gradient descent (SGD) provides a crucial implicit regularization effect for training overparameterized models. Prior theoretical work largely focuses on spherical Gaussian noise, whereas empirical studies demonstrate the phenomenon that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise. This paper theoretically characterizes this phenomenon on a quadratically-parameterized model introduced by Vaskevicius et el. and Woodworth et el. We show that in an over-parameterized setting, SGD with label noise recovers the sparse ground-truth with an arbitrary initialization, whereas SGD with Gaussian noise or gradient descent overfits to dense solutions with large norms. Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not. Code for our project is publicly available.

BibTeX key: haochen2020shape
entry type: article
year: 2020
url: http://arxiv.org/abs/2006.08680
note: cite arxiv:2006.08680

BibSonomy

Shape Matters: Understanding the Implicit Bias of the Noise Covariance

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on