Article,

Exploration-exploitation Tradeoff using Variance Estimates in Multi-armed Bandits

J. Audibert, R. Munos, and {. Szepesvári.
Theoretical Computer Science, 410 (19): 1876--1902 (2009)

Abstract

Algorithms based on upper confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. This paper considers a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. In earlier experimental works, such algorithms were found to outperform the competing algorithms. We provide the first analysis of the expected regret for such algorithms. As expected, our results show that the algorithm that uses the variance estimates has a major advantage over its alternatives that do not use such estimates provided that the variances of the payoffs of the suboptimal arms are low. We also prove that the regret concentrates only at a polynomial rate. This holds for all the upper confidence bound based algorithms and for all bandit problems except those special ones where with probability one the payoff obtained by pulling the optimal arm is larger than the expected payoff for the second best arm. Hence, although upper confidence bound bandit algorithms achieve logarithmic expected regret rates, they might not be suitable for a risk-averse decision maker. We illustrate some of the results by computer simulations.

BibTeX key: audibert2009
entry type: article
year: 2009
journal: Theoretical Computer Science
number: 19
pages: 1876--1902
volume: 410
ee: http://dx.doi.org/10.1016/j.tcs.2009.01.016
date-added: 2010-08-28 17:38:14 -0600
pdf: papers/ucbtuned-journal.pdf
bibsource: DBLP, http://dblp.uni-trier.de
date-modified: 2010-09-02 13:09:16 -0600

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@article{audibert2009, abstract = {Algorithms based on upper confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. This paper considers a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. In earlier experimental works, such algorithms were found to outperform the competing algorithms. We provide the first analysis of the expected regret for such algorithms. As expected, our results show that the algorithm that uses the variance estimates has a major advantage over its alternatives that do not use such estimates provided that the variances of the payoffs of the suboptimal arms are low. We also prove that the regret concentrates only at a polynomial rate. This holds for all the upper confidence bound based algorithms and for all bandit problems except those special ones where with probability one the payoff obtained by pulling the optimal arm is larger than the expected payoff for the second best arm. Hence, although upper confidence bound bandit algorithms achieve logarithmic expected regret rates, they might not be suitable for a risk-averse decision maker. We illustrate some of the results by computer simulations.}, added-at = {2020-03-17T03:03:01.000+0100}, author = {Audibert, J.-Y. and Munos, R. and Szepesv{\'a}ri, {Cs}.}, bibsource = {DBLP, http://dblp.uni-trier.de}, biburl = {https://www.bibsonomy.org/bibtex/267631ef74805a7b28f0c2d56cfb068ad/csaba}, date-added = {2010-08-28 17:38:14 -0600}, date-modified = {2010-09-02 13:09:16 -0600}, ee = {http://dx.doi.org/10.1016/j.tcs.2009.01.016}, interhash = {e566cc5f1038bf6e94b67c331718c590}, intrahash = {67631ef74805a7b28f0c2d56cfb068ad}, journal = {Theoretical Computer Science}, keywords = {Bernstein's algorithms, bandits, inequality, multi-armed sequential stochastic theory}, number = 19, pages = {1876--1902}, pdf = {papers/ucbtuned-journal.pdf}, timestamp = {2020-03-17T03:03:01.000+0100}, title = {Exploration-exploitation Tradeoff using Variance Estimates in Multi-armed Bandits}, volume = 410, year = 2009 }

BibSonomy

Exploration-exploitation Tradeoff using Variance Estimates in Multi-armed Bandits

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on