Inproceedings,

Model-based Reinforcement Learning with Nearly Tight Exploration Complexity Bounds

I. Szita, and {. Szepesvári.
ICML, page 1031--1038. Omnipress, (June 2010)

Abstract

A strong selling point of using a model in reinforcement learning is that model-based algorithms can propagate the obtained experience more quickly, and are able to direct exploration better. As a consequence, fewer exploratory actions are enough to learn a good policy. Strangely enough, current theoretical results for model-based algorithms do not support this claim: In a Markov decision process with N states, the best bounds on the number of exploratory steps necessary are of order $O(N^2 N)$, in contrast to the $O(N N)$ bound available for the model-free, delayed Q-learning algorithm. In this paper we show that a modified version of the Rmax algorithm needs to make at most $O(N N)$ exploratory steps. This matches the lower bound up to logarithmic factors, as well as the upper bound of the state-of-the-art model-free algorithm, while our new bound improves the dependence on the discount factor gamma.

BibTeX key: szita2010
entry type: inproceedings
booktitle: ICML
year: 2010
month: June
pages: 1031--1038
publisher: Omnipress
crossref: ICML2010
ee: http://www.icml2010.org/papers/546.pdf
date-added: 2010-08-28 17:38:14 -0600
pdf: papers/ICML10_rmax_improved.pdf
date-modified: 2010-11-25 00:50:11 -0700

BibSonomy

Model-based Reinforcement Learning with Nearly Tight Exploration Complexity Bounds

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on