An overview of gradient descent optimization algorithms
S. Ruder. (2016)cite arxiv:1609.04747Comment: 12 pages, 6 figures.
Abstract
Gradient descent optimization algorithms, while increasingly popular, are
often used as black-box optimizers, as practical explanations of their
strengths and weaknesses are hard to come by. This article aims to provide the
reader with intuitions with regard to the behaviour of different algorithms
that will allow her to put them to use. In the course of this overview, we look
at different variants of gradient descent, summarize challenges, introduce the
most common optimization algorithms, review architectures in a parallel and
distributed setting, and investigate additional strategies for optimizing
gradient descent.
Description
[1609.04747] An overview of gradient descent optimization algorithms
%0 Generic
%1 ruder2016overview
%A Ruder, Sebastian
%D 2016
%K Adam NAG SGD gradient_descent stochastic_optimization
%T An overview of gradient descent optimization algorithms
%U http://arxiv.org/abs/1609.04747
%X Gradient descent optimization algorithms, while increasingly popular, are
often used as black-box optimizers, as practical explanations of their
strengths and weaknesses are hard to come by. This article aims to provide the
reader with intuitions with regard to the behaviour of different algorithms
that will allow her to put them to use. In the course of this overview, we look
at different variants of gradient descent, summarize challenges, introduce the
most common optimization algorithms, review architectures in a parallel and
distributed setting, and investigate additional strategies for optimizing
gradient descent.
@misc{ruder2016overview,
abstract = {Gradient descent optimization algorithms, while increasingly popular, are
often used as black-box optimizers, as practical explanations of their
strengths and weaknesses are hard to come by. This article aims to provide the
reader with intuitions with regard to the behaviour of different algorithms
that will allow her to put them to use. In the course of this overview, we look
at different variants of gradient descent, summarize challenges, introduce the
most common optimization algorithms, review architectures in a parallel and
distributed setting, and investigate additional strategies for optimizing
gradient descent.},
added-at = {2017-06-09T13:45:11.000+0200},
author = {Ruder, Sebastian},
biburl = {https://www.bibsonomy.org/bibtex/24d1336b721e9154546ba8e1d87046316/suqbar},
description = {[1609.04747] An overview of gradient descent optimization algorithms},
interhash = {6e9f951ec79eba6cb7eb27db1e6d4ad6},
intrahash = {4d1336b721e9154546ba8e1d87046316},
keywords = {Adam NAG SGD gradient_descent stochastic_optimization},
note = {cite arxiv:1609.04747Comment: 12 pages, 6 figures},
timestamp = {2017-06-09T13:45:11.000+0200},
title = {An overview of gradient descent optimization algorithms},
url = {http://arxiv.org/abs/1609.04747},
year = 2016
}