Abstract
Recently, stochastic momentum methods have been widely adopted in training
deep neural networks. However, their convergence analysis is still
underexplored at the moment, in particular for non-convex optimization. This
paper fills the gap between practice and theory by developing a basic
convergence analysis of two stochastic momentum methods, namely stochastic
heavy-ball method and the stochastic variant of Nesterov's accelerated gradient
method. We hope that the basic convergence results developed in this paper can
serve the reference to the convergence of stochastic momentum methods and also
serve the baselines for comparison in future development. The novelty of
convergence analysis presented in this paper is a unified framework, revealing
more insights about the similarities and differences between different
stochastic momentum methods and stochastic gradient method. The unified
framework also exhibits a continuous change from the gradient method to
Nesterov's accelerated gradient method and the heavy-ball method incurred by a
free parameter. The theoretical and empirical results show that the stochastic
variant of Nesterov's accelerated gradient method achieves a good tradeoff for
optimizing deep neural networks among the three stochastic methods.
Users
Please
log in to take part in the discussion (add own reviews or comments).