Inproceedings,

Distribution-Dependent Analysis of Gibbs-ERM Principle

I. Kuzborskij, N. Cesa-Bianchi, and {. Szepesvári.
COLT, (April 2019)

Abstract

Gibbs-ERM is a natural idealized model of learning with stochastic optimization algorithms (such as Stochastic Gradient Langevin Dynamics and --- to some extent--- Stochastic Gradient Descent) which also appears in other contexts, including PAC-Bayesian theory, and sampling mechanisms. In this work we study the excess risk suffered by the Gibbs-ERM learner with non-convex, regularized empirical risk. Our goal is to understand the interplay between the data-generating distribution and the problem of learning in large hypothesis spaces. Our main results are distribution-dependent upper bounds on several notions of excess risk. We show that, in all cases, the distribution-dependent excess risk is essentially controlled by the "local" effective dimension of the problem, a well-established notion of effective dimension appearing in the analyses of several previous algorithms, including SGD and ridge regression. Ours is the first work that brings this notion of dimension to the analysis of learning via Gibbs densities. The distribution-dependent view we advocate here improves upon earlier results of Raginsky et al. (2017), and can yield much tighter bounds depending on the interplay between the data-generating distribution and the loss function. The first part of our analysis focuses on the localized excess risk in the vicinity of a fixed local minimizer. This result is then extended to bounds on the global excess risk, by characterizing probabilities of local minima (and their complement) under Gibbs densities, a result which might be of independent interest.

BibTeX key: KuCBSze19
entry type: inproceedings
booktitle: COLT
year: 2019
month: April
pdf: papers/COLT2019_gibbs.pdf
date-modified: 2019-07-20 14:32:10 -0600
date-added: 2019-07-20 10:23:25 -0600
bdsk-url-1: http://proceedings.mlr.press/v54/hanawal17a.html

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@inproceedings{KuCBSze19, abstract = {Gibbs-ERM is a natural idealized model of learning with stochastic optimization algorithms (such as Stochastic Gradient Langevin Dynamics and --- to some extent--- Stochastic Gradient Descent) which also appears in other contexts, including PAC-Bayesian theory, and sampling mechanisms. In this work we study the excess risk suffered by the Gibbs-ERM learner with non-convex, regularized empirical risk. Our goal is to understand the interplay between the data-generating distribution and the problem of learning in large hypothesis spaces. Our main results are distribution-dependent upper bounds on several notions of excess risk. We show that, in all cases, the distribution-dependent excess risk is essentially controlled by the "local" effective dimension of the problem, a well-established notion of effective dimension appearing in the analyses of several previous algorithms, including SGD and ridge regression. Ours is the first work that brings this notion of dimension to the analysis of learning via Gibbs densities. The distribution-dependent view we advocate here improves upon earlier results of Raginsky et al. (2017), and can yield much tighter bounds depending on the interplay between the data-generating distribution and the loss function. The first part of our analysis focuses on the localized excess risk in the vicinity of a fixed local minimizer. This result is then extended to bounds on the global excess risk, by characterizing probabilities of local minima (and their complement) under Gibbs densities, a result which might be of independent interest.}, added-at = {2020-03-17T03:03:01.000+0100}, author = {Kuzborskij, I. and Cesa-Bianchi, N. and {Sz}epesv{\'a}ri, {Cs}.}, bdsk-url-1 = {http://proceedings.mlr.press/v54/hanawal17a.html}, biburl = {https://www.bibsonomy.org/bibtex/228f3c78084cc9684915108acf74aa941/csaba}, booktitle = {COLT}, date-added = {2019-07-20 10:23:25 -0600}, date-modified = {2019-07-20 14:32:10 -0600}, interhash = {a49a715b5b3f9d0503700e2d57055b0f}, intrahash = {28f3c78084cc9684915108acf74aa941}, keywords = {learning theory}, month = {April}, pdf = {papers/COLT2019_gibbs.pdf}, timestamp = {2020-03-17T03:03:01.000+0100}, title = {Distribution-Dependent Analysis of {G}ibbs-{ERM} Principle}, year = 2019 }

BibSonomy

Distribution-Dependent Analysis of Gibbs-ERM Principle

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on