Article,

Fast Convergence for Langevin Diffusion with Matrix Manifold Structure

A. Moitra, and A. Risteski.
(2020)cite arxiv:2002.05576Comment: 52 pages.

Abstract

In this paper, we study the problem of sampling from distributions of the form p(x) e^-f(x) for some function f whose values and gradients we can query. This mode of access to f is natural in the scenarios in which such problems arise, for instance sampling from posteriors in parametric Bayesian models. Classical results show that a natural random walk, Langevin diffusion, mixes rapidly when f is convex. Unfortunately, even in simple examples, the applications listed above will entail working with functions f that are nonconvex -- for which sampling from p may in general require an exponential number of queries. In this paper, we study one aspect of nonconvexity relevant for modern machine learning applications: existence of invariances (symmetries) in the function f, as a result of which the distribution p will have manifolds of points with equal probability. We give a recipe for proving mixing time bounds of Langevin dynamics in order to sample from manifolds of local optima of the function f in settings where the distribution is well-concentrated around them. We specialize our arguments to classic matrix factorization-like Bayesian inference problems where we get noisy measurements A(XX^T), X R^d \times k of a low-rank matrix, i.e. f(X) = \|A(XX^T) - b\|^2_2, X R^d k, and the inverse of the variance of the noise. Such functions f are invariant under orthogonal transformations, and include problems like matrix factorization, sensing, completion. Beyond sampling, Langevin dynamics is a popular toy model for studying stochastic gradient descent. Along these lines, we believe that our work is an important first step towards understanding how SGD behaves when there is a high degree of symmetry in the space of parameters the produce the same output.

BibTeX key: moitra2020convergence
entry type: article
year: 2020
url: http://arxiv.org/abs/2002.05576
note: cite arxiv:2002.05576Comment: 52 pages

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Journal Article %1 moitra2020convergence %A Moitra, Ankur %A Risteski, Andrej %D 2020 %K bayesian dynamic optimization readings %T Fast Convergence for Langevin Diffusion with Matrix Manifold Structure %U http://arxiv.org/abs/2002.05576 %X In this paper, we study the problem of sampling from distributions of the form p(x) e^-f(x) for some function f whose values and gradients we can query. This mode of access to f is natural in the scenarios in which such problems arise, for instance sampling from posteriors in parametric Bayesian models. Classical results show that a natural random walk, Langevin diffusion, mixes rapidly when f is convex. Unfortunately, even in simple examples, the applications listed above will entail working with functions f that are nonconvex -- for which sampling from p may in general require an exponential number of queries. In this paper, we study one aspect of nonconvexity relevant for modern machine learning applications: existence of invariances (symmetries) in the function f, as a result of which the distribution p will have manifolds of points with equal probability. We give a recipe for proving mixing time bounds of Langevin dynamics in order to sample from manifolds of local optima of the function f in settings where the distribution is well-concentrated around them. We specialize our arguments to classic matrix factorization-like Bayesian inference problems where we get noisy measurements A(XX^T), X R^d \times k of a low-rank matrix, i.e. f(X) = \|A(XX^T) - b\|^2_2, X R^d k, and the inverse of the variance of the noise. Such functions f are invariant under orthogonal transformations, and include problems like matrix factorization, sensing, completion. Beyond sampling, Langevin dynamics is a popular toy model for studying stochastic gradient descent. Along these lines, we believe that our work is an important first step towards understanding how SGD behaves when there is a high degree of symmetry in the space of parameters the produce the same output.

@article{moitra2020convergence, abstract = {In this paper, we study the problem of sampling from distributions of the form p(x) \propto e^{-\beta f(x)} for some function f whose values and gradients we can query. This mode of access to f is natural in the scenarios in which such problems arise, for instance sampling from posteriors in parametric Bayesian models. Classical results show that a natural random walk, Langevin diffusion, mixes rapidly when f is convex. Unfortunately, even in simple examples, the applications listed above will entail working with functions f that are nonconvex -- for which sampling from p may in general require an exponential number of queries. In this paper, we study one aspect of nonconvexity relevant for modern machine learning applications: existence of invariances (symmetries) in the function f, as a result of which the distribution p will have manifolds of points with equal probability. We give a recipe for proving mixing time bounds of Langevin dynamics in order to sample from manifolds of local optima of the function f in settings where the distribution is well-concentrated around them. We specialize our arguments to classic matrix factorization-like Bayesian inference problems where we get noisy measurements A(XX^T), X \in R^{d \times k} of a low-rank matrix, i.e. f(X) = \|A(XX^T) - b\|^2_2, X \in R^{d \times k}, and \beta the inverse of the variance of the noise. Such functions f are invariant under orthogonal transformations, and include problems like matrix factorization, sensing, completion. Beyond sampling, Langevin dynamics is a popular toy model for studying stochastic gradient descent. Along these lines, we believe that our work is an important first step towards understanding how SGD behaves when there is a high degree of symmetry in the space of parameters the produce the same output.}, added-at = {2020-02-20T13:04:02.000+0100}, author = {Moitra, Ankur and Risteski, Andrej}, biburl = {https://www.bibsonomy.org/bibtex/2b40c3b465c555b3c9c1fbb9197d06cda/kirk86}, description = {[2002.05576] Fast Convergence for Langevin Diffusion with Matrix Manifold Structure}, interhash = {925fb52a75d4b671e6008b9f06df42bf}, intrahash = {b40c3b465c555b3c9c1fbb9197d06cda}, keywords = {bayesian dynamic optimization readings}, note = {cite arxiv:2002.05576Comment: 52 pages}, timestamp = {2020-02-20T13:04:02.000+0100}, title = {Fast Convergence for Langevin Diffusion with Matrix Manifold Structure}, url = {http://arxiv.org/abs/2002.05576}, year = 2020 }

BibSonomy

Fast Convergence for Langevin Diffusion with Matrix Manifold Structure

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on