copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Policy Gradient Methods for Reinforcement Learning with Function Approximation

R. Sutton, D. McAllester, S. Singh, and Y. Mansour. Proceedings of the 12th International Conference on Neural Information Processing Systems, page 1057--1063. Cambridge, MA, USA, MIT Press, (1999)

Abstract

Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly represented by its own function approximator, independent of the value function, and is updated according to the gradient of expected reward with respect to the policy parameters. Williams's REINFORCE method and actor-critic methods are examples of this approach. Our main new result is to show that the gradient can be written in a form suitable for estimation from experience aided by an approximate action-value or advantage function. Using this result, we prove for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.

Description

Policy gradient methods for reinforcement learning with function approximation

Links and resources

BibTeX key: Sutton:1999:PGM:3009657.3009806
entry type: inproceedings
address: Cambridge, MA, USA
booktitle: Proceedings of the 12th International Conference on Neural Information Processing Systems
year: 1999
pages: 1057--1063
publisher: MIT Press
series: NIPS'99
acmid: 3009806
numpages: 7
location: Denver, CO
url: http://dl.acm.org/citation.cfm?id=3009657.3009806

@e.fischer's tags highlighted

Cite this publication

search on

Meta data

Last update 5 years ago
Created 6 years ago

Comments and Reviews
(0)

There is no review or comment yet. You can write one!

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Policy Gradient Methods for Reinforcement Learning with Function Approximation

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Policy Gradient Methods for Reinforcement Learning with Function Approximation

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Policy Gradient Methods for Reinforcement Learning with Function Approximation

Comments and Reviews
(0)