Least Squares Temporal Difference Learning and Galerkin's Method

Abstract

The problem of estimating the value function underlying a Markovian reward process is considered. As it is well known, the value function underlying a Markovian reward process satisfied a linear fixed point equation. One approach to learning the value function from finite data is to find a good approximation to the value function in a given (linear) subspace of the space of value functions. We review some of the issues that arise when following this approach, as well as some results that characterize the finite-sample performance of some of the algorithms.

BibTeX key: Sze12
entry type: incollection
address: Oberwolfach
booktitle: Mini-Workshop: Mathematics of Machine Learning
year: 2011
number: 3
pages: 2385--2388
publisher: European Mathematical Society
series: Oberwolfach Reports
volume: 8
date-added: 2012-06-03 14:57:46 -0600
pdf: papers/Oberwolfach-Report.pdf
date-modified: 2013-03-11 21:17:16 -0600

BibSonomy

Least Squares Temporal Difference Learning and Galerkin's Method

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on