copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Poly-time universality and limitations of deep learning

E. Abbe, and C. Sandon. (2020)cite arxiv:2001.02992Comment: arXiv admin note: substantial text overlap with arXiv:1812.06369.

Abstract

The goal of this paper is to characterize function distributions that deep learning can or cannot learn in poly-time. A universality result is proved for SGD-based deep learning and a non-universality result is proved for GD-based deep learning; this also gives a separation between SGD-based deep learning and statistical query algorithms: (1) Deep learning with SGD is efficiently universal. Any function distribution that can be learned from samples in poly-time can also be learned by a poly-size neural net trained with SGD on a poly-time initialization with poly-steps, poly-rate and possibly poly-noise. Therefore deep learning provides a universal learning paradigm: it was known that the approximation and estimation errors could be controlled with poly-size neural nets, using ERM that is NP-hard; this new result shows that the optimization error can also be controlled with SGD in poly-time. The picture changes for GD with large enough batches: (2) Result (1) does not hold for GD: Neural nets of poly-size trained with GD (full gradients or large enough batches) on any initialization with poly-steps, poly-range and at least poly-noise cannot learn any function distribution that has super-polynomial cross-predictability, where the cross-predictability gives a measure of ``average'' function correlation -- relations and distinctions to the statistical dimension are discussed. In particular, GD with these constraints can learn efficiently monomials of degree $k$ if and only if $k$ is constant. Thus (1) and (2) point to an interesting contrast: SGD is universal even with some poly-noise while full GD or SQ algorithms are not (e.g., parities).

Description

[2001.02992] Poly-time universality and limitations of deep learning

Links and resources

BibTeX key: abbe2020polytime
entry type: article
year: 2020
url: http://arxiv.org/abs/2001.02992
note: cite arxiv:2001.02992Comment: arXiv admin note: substantial text overlap with arXiv:1812.06369

@kirk86's tags highlighted

Cite this publication

@article{abbe2020polytime, abstract = {The goal of this paper is to characterize function distributions that deep learning can or cannot learn in poly-time. A universality result is proved for SGD-based deep learning and a non-universality result is proved for GD-based deep learning; this also gives a separation between SGD-based deep learning and statistical query algorithms: (1) {\it Deep learning with SGD is efficiently universal.} Any function distribution that can be learned from samples in poly-time can also be learned by a poly-size neural net trained with SGD on a poly-time initialization with poly-steps, poly-rate and possibly poly-noise. Therefore deep learning provides a universal learning paradigm: it was known that the approximation and estimation errors could be controlled with poly-size neural nets, using ERM that is NP-hard; this new result shows that the optimization error can also be controlled with SGD in poly-time. The picture changes for GD with large enough batches: (2) {\it Result (1) does not hold for GD:} Neural nets of poly-size trained with GD (full gradients or large enough batches) on any initialization with poly-steps, poly-range and at least poly-noise cannot learn any function distribution that has super-polynomial {\it cross-predictability,} where the cross-predictability gives a measure of ``average'' function correlation -- relations and distinctions to the statistical dimension are discussed. In particular, GD with these constraints can learn efficiently monomials of degree $k$ if and only if $k$ is constant. Thus (1) and (2) point to an interesting contrast: SGD is universal even with some poly-noise while full GD or SQ algorithms are not (e.g., parities).}, added-at = {2020-01-13T14:11:50.000+0100}, author = {Abbe, Emmanuel and Sandon, Colin}, biburl = {https://www.bibsonomy.org/bibtex/2f201989295242944d130a210140e58b2/kirk86}, description = {[2001.02992] Poly-time universality and limitations of deep learning}, interhash = {fc3400216b82ba9aede663083b374af6}, intrahash = {f201989295242944d130a210140e58b2}, keywords = {approximate generalization optimization}, note = {cite arxiv:2001.02992Comment: arXiv admin note: substantial text overlap with arXiv:1812.06369}, timestamp = {2020-01-13T14:11:50.000+0100}, title = {Poly-time universality and limitations of deep learning}, url = {http://arxiv.org/abs/2001.02992}, year = 2020 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Poly-time universality and limitations of deep learning

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Poly-time universality and limitations of deep learning

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Poly-time universality and limitations of deep learning

Comments and Reviews
(0)