Misc,

Implicit stochastic gradient descent

P. Toulis, and E. Airoldi.
(2014)cite arxiv:1408.2923.

Abstract

Stochastic optimization procedures, such as stochastic gradient descent, have gained popularity for parameter estimation from large data sets. However, standard stochastic optimization procedures cannot effectively combine numerical stability with statistical and computational efficiency. Here, we introduce an implicit stochastic gradient descent procedure, the iterates of which are implicitly defined. Intuitively, implicit iterates shrink the standard iterates. The amount of shrinkage depends on the observed Fisher information matrix, which does not need to be explicitly computed in practice, thus increasing stability without increasing the computational burden. When combined with averaging, the proposed procedure achieves statistical efficiency as well. We derive non-asymptotic bounds and characterize the asymptotic distribution of implicit procedures. Our analysis also reveals the asymptotic variance of a number of existing procedures. We demonstrate implicit stochastic gradient descent by further developing theory for generalized linear models, Cox proportional hazards, and M-estimation problems, and by carrying out extensive experiments. Our results suggest that the implicit stochastic gradient descent procedure is poised to become the workhorse of estimation with large data sets.

BibTeX key: toulis2014implicit
entry type: misc
year: 2014
url: http://arxiv.org/abs/1408.2923
note: cite arxiv:1408.2923

BibSonomy

Implicit stochastic gradient descent

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on