The Asymptotic Convergence-Rate of Q-learning

Abstract

In this paper we show that for discounted MDPs with discount factor $\gamma>1/2$ the asymptotic rate of convergence of Q-learning is O($1/t^R(1-\gamma$)) if $R(1-\gamma)<1/2$ and O($łogt/ t$) otherwise provided that the state-action pairs are sampled from a fixed probability distribution. Here $R=p_min/p_max$ is the ratio of the minimum and maximum state-action occupation frequencies. The results extend to convergent on-line learning provided that $p_min>0$, where $p_min$ and $p_max$ now become the minimum and maximum state-action occupation frequencies corresponding to the stationary distribution.

BibTeX key: szepesvari1997b
entry type: inproceedings
booktitle: NIPS
year: 1997
pages: 1064--1070
crossref: NIPS10
pdf: papers/nips97.ps.pdf
date-modified: 2010-11-25 00:50:27 -0700
date-added: 2010-08-28 17:38:14 -0600

BibSonomy

The Asymptotic Convergence-Rate of Q-learning

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on