Reduced-Variance Payoff Estimation in Adversarial Bandit Problems

Zusammenfassung

A natural way to compare learning methods in non-stationary environments is to compare their regret. In this paper we consider the regret of algorithms in adversarial multi-armed bandit problems. We propose several methods to improve the performance of the baseline exponentially weighted average forecaster by changing the payoff-estimation methods. We argue that improved performance can be achieved by constructing payoff estimation methods that produce estimates with low variance. Our arguments are backed up by both theoretical and empirical results. In fact, our empirical results show that significant performance gains are possible over the baseline algorithm.

BibTeX-Schlüssel: kocsis2005
Eintragstyp: inproceedings
Buchtitel: Proceedings of the ECML-2005 Workshop on Reinforcement Learning in Non-Stationary Environments
Jahr: 2005
slides: talks/ECML2005-rlw-slides.pdf
pdf: papers/kocsis-ecml2005-ext.pdf
date-modified: 2010-09-02 13:09:16 -0600
slide-handout: talks/ecml2005-rlw-handout.pdf
date-added: 2010-08-28 17:38:14 -0600

BibSonomy

Reduced-Variance Payoff Estimation in Adversarial Bandit Problems

Zusammenfassung

Tags

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf