Incollection,

Least Squares Temporal Difference Learning and Galerkin's Method

.
Mini-Workshop: Mathematics of Machine Learning, volume 8 of Oberwolfach Reports, European Mathematical Society, Oberwolfach, (2011)

Abstract

The problem of estimating the value function underlying a Markovian reward process is considered. As it is well known, the value function underlying a Markovian reward process satisfied a linear fixed point equation. One approach to learning the value function from finite data is to find a good approximation to the value function in a given (linear) subspace of the space of value functions. We review some of the issues that arise when following this approach, as well as some results that characterize the finite-sample performance of some of the algorithms.

Tags

Users

  • @csaba

Comments and Reviews