New insights and perspectives on the natural gradient method
J. Martens. (2014)cite arxiv:1412.1193Comment: New title and abstract. Added multiple sections, including a proper introduction/outline and one on convergence speed. Many other revisions throughout.
Аннотация
Natural gradient descent is an optimization method traditionally motivated
from the perspective of information geometry, and works well for many
applications as an alternative to stochastic gradient descent. In this paper we
critically analyze this method and its properties, and show how it can be
viewed as a type of approximate 2nd-order optimization method, where the Fisher
information matrix used to compute the natural gradient direction can be viewed
as an approximation of the Hessian. This perspective turns out to have
significant implications for how to design a practical and robust version of
the method. Among our various other contributions is a thorough analysis of the
convergence speed of natural gradient descent and more general stochastic
methods, a critical examination of the oft-used "empirical" approximation of
the Fisher matrix, and an analysis of the (approximate) parameterization
invariance property possessed by the method, which we show still holds for
certain other choices of the curvature matrix, but notably not the Hessian.
Описание
New insights and perspectives on the natural gradient method
cite arxiv:1412.1193Comment: New title and abstract. Added multiple sections, including a proper introduction/outline and one on convergence speed. Many other revisions throughout
%0 Generic
%1 martens2014insights
%A Martens, James
%D 2014
%K gradient-descent neural-network
%T New insights and perspectives on the natural gradient method
%U http://arxiv.org/abs/1412.1193
%X Natural gradient descent is an optimization method traditionally motivated
from the perspective of information geometry, and works well for many
applications as an alternative to stochastic gradient descent. In this paper we
critically analyze this method and its properties, and show how it can be
viewed as a type of approximate 2nd-order optimization method, where the Fisher
information matrix used to compute the natural gradient direction can be viewed
as an approximation of the Hessian. This perspective turns out to have
significant implications for how to design a practical and robust version of
the method. Among our various other contributions is a thorough analysis of the
convergence speed of natural gradient descent and more general stochastic
methods, a critical examination of the oft-used "empirical" approximation of
the Fisher matrix, and an analysis of the (approximate) parameterization
invariance property possessed by the method, which we show still holds for
certain other choices of the curvature matrix, but notably not the Hessian.
@misc{martens2014insights,
abstract = {Natural gradient descent is an optimization method traditionally motivated
from the perspective of information geometry, and works well for many
applications as an alternative to stochastic gradient descent. In this paper we
critically analyze this method and its properties, and show how it can be
viewed as a type of approximate 2nd-order optimization method, where the Fisher
information matrix used to compute the natural gradient direction can be viewed
as an approximation of the Hessian. This perspective turns out to have
significant implications for how to design a practical and robust version of
the method. Among our various other contributions is a thorough analysis of the
convergence speed of natural gradient descent and more general stochastic
methods, a critical examination of the oft-used "empirical" approximation of
the Fisher matrix, and an analysis of the (approximate) parameterization
invariance property possessed by the method, which we show still holds for
certain other choices of the curvature matrix, but notably not the Hessian.},
added-at = {2015-10-02T17:08:04.000+0200},
author = {Martens, James},
biburl = {https://www.bibsonomy.org/bibtex/2ec78b6b6c87d8bcf246f998355669b0b/stdiff},
description = {New insights and perspectives on the natural gradient method},
interhash = {5aa53b4d22911c7c08d404fe37a4b2b1},
intrahash = {ec78b6b6c87d8bcf246f998355669b0b},
keywords = {gradient-descent neural-network},
note = {cite arxiv:1412.1193Comment: New title and abstract. Added multiple sections, including a proper introduction/outline and one on convergence speed. Many other revisions throughout},
timestamp = {2015-10-02T17:08:04.000+0200},
title = {New insights and perspectives on the natural gradient method},
url = {http://arxiv.org/abs/1412.1193},
year = 2014
}