Inproceedings,

Model-based Reinforcement Learning with Nearly Tight Exploration Complexity Bounds

, and .
ICML, page 1031--1038. Omnipress, (June 2010)

Abstract

A strong selling point of using a model in reinforcement learning is that model-based algorithms can propagate the obtained experience more quickly, and are able to direct exploration better. As a consequence, fewer exploratory actions are enough to learn a good policy. Strangely enough, current theoretical results for model-based algorithms do not support this claim: In a Markov decision process with N states, the best bounds on the number of exploratory steps necessary are of order $O(N^2 N)$, in contrast to the $O(N N)$ bound available for the model-free, delayed Q-learning algorithm. In this paper we show that a modified version of the Rmax algorithm needs to make at most $O(N N)$ exploratory steps. This matches the lower bound up to logarithmic factors, as well as the upper bound of the state-of-the-art model-free algorithm, while our new bound improves the dependence on the discount factor gamma.

Tags

Users

  • @csaba

Comments and Reviews