copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

L. Chizat, and F. Bach. (2018)cite arxiv:1805.09545Comment: Advances in Neural Information Processing Systems (NIPS), Dec 2018, Montréal, Canada.

Abstract

Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure. This includes sparse spikes deconvolution or training a neural network with a single hidden layer. For these problems, we study a simple minimization method: the unknown measure is discretized into a mixture of particles and a continuous-time gradient descent is performed on their weights and positions. This is an idealization of the usual way to train neural networks with a large hidden layer. We show that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers. The proof involves Wasserstein gradient flows, a by-product of optimal transport theory. Numerical experiments show that this asymptotic behavior is already at play for a reasonable number of particles, even in high dimension.

Description

[1805.09545] On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

Links and resources

BibTeX key: chizat2018global
entry type: article
year: 2018
url: http://arxiv.org/abs/1805.09545
note: cite arxiv:1805.09545Comment: Advances in Neural Information Processing Systems (NIPS), Dec 2018, Montréal, Canada

@kirk86's tags highlighted

Cite this publication

search on

Meta data

Last update 5 years ago
Created 5 years ago

Comments and Reviews
(0)

There is no review or comment yet. You can write one!

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

Comments and Reviews
(0)