Approximate Policy Iteration with Linear Action Models

Abstract

In this paper we consider the problem of finding a good policy given some batch data. We propose a new approach, LAM-API, that first builds a so-called linear action model (LAM) from the data and then uses the learned model and the collected data in approximate policy iteration (API) to find a good policy. A natural choice for the policy evaluation step in this algorithm is to use least-squares temporal difference (LSTD) learning algorithm. Empirical results on three benchmark problems show that this particular instance of LAM-API performs competitively as compared with LSPI, both from the point of view of data and computational efficiency.

BibTeX key: YaoSze12
entry type: inproceedings
booktitle: AAAI-2012
year: 2012
month: July
pages: 1212--1217
pdf: papers/lamapi.pdf
date-modified: 2013-07-16 12:05:29 -0600
date-added: 2012-06-06 14:33:03 -0600

BibSonomy

Approximate Policy Iteration with Linear Action Models

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on