Inproceedings,

Regularized Policy Iteration

A. Farahmand, M. Ghavamzadeh, {. Szepesvári, and S. Mannor.
NIPS, page 441--448. (2008)

Abstract

In this paper we consider approximate policy-iteration-based reinforcement learning algorithms. In order to implement a flexible function approximation scheme we propose the use of non-parametric methods with regularization, providing a convenient way to control the complexity of the function approximator. We propose two novel regularized policy iteration algorithms by adding L2-regularization to two widely-used policy evaluation methods: Bellman residual minimization (BRM) and least-squares temporal difference learning (LSTD). We derive efficient implementation for our algorithms when the approximate value-functions belong to a reproducing kernel Hilbert space. We also provide finite-sample performance bounds for our algorithms and show that they are able to achieve optimal rates of convergence under the studied conditions.

BibTeX key: farahmand2008a
entry type: inproceedings
booktitle: NIPS
year: 2008
pages: 441--448
crossref: NIPS21
ee: http://books.nips.cc/papers/files/nips21/NIPS2008_0871.pdf
date-added: 2010-08-28 17:38:14 -0600
pdf: papers/nips08-regrl.pdf
bibsource: DBLP, http://dblp.uni-trier.de
date-modified: 2010-11-25 00:51:02 -0700

BibSonomy

Regularized Policy Iteration

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on