Inproceedings,

Online Optimization in X-armed Bandits

S. Bubeck, R. Munos, G. Stoltz, and {. Szepesvári.
NIPS, page 201--208. MIT Press, (2008)

Abstract

We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space and the mean-payoff function is ``locally Lipschitz'' with respect to a dissimilarity function that is known to the decision maker. Under this condition we construct an arm selection policy whose regret improves upon previous results for a large class of problems. In particular, our results imply that if X is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally Hölder with a known exponent, then the expected regret is bounded up to a logarithmic factor by sqrt(n), i.e., the rate of the growth of the regret is independent of the dimension of the space. We also prove the minimax optimality of our algorithm for the class of problems considered.

BibTeX key: bubeck2008
entry type: inproceedings
booktitle: NIPS
year: 2008
pages: 201--208
publisher: MIT Press
crossref: NIPS21
ee: http://books.nips.cc/papers/files/nips21/NIPS2008_0553.pdf
date-added: 2010-08-28 17:38:14 -0600
pdf: papers/HOO-NIPS08.pdf
bibsource: DBLP, http://dblp.uni-trier.de
date-modified: 2012-01-21 16:46:10 -0700

BibSonomy

Online Optimization in X-armed Bandits

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on