An overview of gradient descent optimization algorithms.

Abstract

Gradient descent optimization algorithms, while increasingly popular, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by. This article aims to provide the reader with intuitions with regard to the behaviour of different algorithms that will allow her to put them to use. In the course of this overview, we look at different variants of gradient descent, summarize challenges, introduce the most common optimization algorithms, review architectures in a parallel and distributed setting, and investigate additional strategies for optimizing gradient descent.

BibTeX key: ruder2016overview
entry type: misc
year: 2016
url: http://arxiv.org/abs/1609.04747
note: cite arxiv:1609.04747Comment: Added derivations of AdaMax and Nadam

BibSonomy

An overview of gradient descent optimization algorithms.

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on