copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Tuning Bandit Algorithms in Stochastic Environments

J. Audibert, R. Munos, and {. Szepesvári. ALT, page 150--165. Springer, (2007)See audibert2009 for a longer, updated version.

Abstract

Algorithms based on upper-confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. In this paper we consider a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. In earlier experimental works, such algorithms were found to outperform the competing algorithms. The purpose of this paper is to provide a theoretical explanation of these findings and provide theoretical guidelines for the tuning of the parameters of these algorithms. For this we analyze the expected regret and for the first time the concentration of the regret. The analysis of the expected regret shows that variance estimates can be especially advantageous when the payoffs of suboptimal arms have low variance. The risk analysis, rather unexpectedly, reveals that except some very special bandit problems, for upper confidence bound based algorithms with standard bias sequences, the regret concentrates only at a polynomial rate. Hence, although these algorithms achieve logarithmic expected regret rates, they seem less attractive when the risk of achieving much worse than logarithmic cumulative regret is also taken into account.

Cite this publication

@inproceedings{audibert2007, abstract = {Algorithms based on upper-confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. In this paper we consider a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. In earlier experimental works, such algorithms were found to outperform the competing algorithms. The purpose of this paper is to provide a theoretical explanation of these findings and provide theoretical guidelines for the tuning of the parameters of these algorithms. For this we analyze the expected regret and for the first time the concentration of the regret. The analysis of the expected regret shows that variance estimates can be especially advantageous when the payoffs of suboptimal arms have low variance. The risk analysis, rather unexpectedly, reveals that except some very special bandit problems, for upper confidence bound based algorithms with standard bias sequences, the regret concentrates only at a polynomial rate. Hence, although these algorithms achieve logarithmic expected regret rates, they seem less attractive when the risk of achieving much worse than logarithmic cumulative regret is also taken into account.}, added-at = {2020-03-17T03:03:01.000+0100}, author = {Audibert, J.-Y. and Munos, R. and Szepesv{\'a}ri, {Cs}.}, biburl = {https://www.bibsonomy.org/bibtex/234e68f37273fe28e5a1c3a981693d36d/csaba}, booktitle = {ALT}, date-added = {2010-08-28 17:38:14 -0600}, date-modified = {2012-06-06 21:39:14 -0600}, interhash = {5fc7822412dde458a2448c619080826b}, intrahash = {34e68f37273fe28e5a1c3a981693d36d}, keywords = {Bernstein's algorithms, bandits, inequality, multi-armed sequential stochastic theory}, note = {See \cite{audibert2009} for a longer, updated version}, pages = {150--165}, pdf = {papers/ucb_alt.pdf}, ppt = {talks/ALT07-UCBTuned-Talk.ppt}, publisher = {Springer}, timestamp = {2020-03-17T03:03:01.000+0100}, title = {Tuning Bandit Algorithms in Stochastic Environments}, year = 2007 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Tuning Bandit Algorithms in Stochastic Environments

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Tuning Bandit Algorithms in Stochastic Environments

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Tuning Bandit Algorithms in Stochastic Environments

Comments and Reviews
(0)