Comparing Value-Function Estimation Algorithms in Undiscounted Problems

Аннотация

We compare scaling properties of several value-function estimation algorithms. In particular, we prove that Q-learning can scale exponentially slowly with the number of states. We identify the reasons of the slow convergence and show that both TD($łambda$) and learning with a fixed learning-rate enjoy rather fast convergence, just like the model-based method.

ключ BibTeX: beleznay1999
тип записи: techreport
адрес: Budapest 1121, Konkoly Th. M. u. 29-33, Hungary
год: 1999
учреждение: Mindmaker Ltd.
номер: TR-99-02
pdf: papers/slowql-tr99-02.ps.pdf
date-modified: 2010-09-02 13:09:15 -0600
date-added: 2010-08-28 17:38:14 -0600

тэги

Пользователи данного ресурса

Комментарии и рецензиипоказать / перейти в невидимый режим

Пожалуйста, войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)

BibSonomy