Artikel in einem Konferenzbericht,

The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits (long version)

T. Lattimore, und {. Szepesvári.
AISTATS, Volume 54 von PMLR, Seite 728--737. (April 2017)

Zusammenfassung

Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications. Current approaches focus on generalising existing techniques for finite-armed bandits, notably the otimism principle and Thompson sampling. Prior analysis has mostly focussed on the worst-case setting. We analyse the asymptotic regret and show matching upper and lower bounds on what is achievable. Surprisingly, our results show that no algorithm based on optimism or Thompson sampling will ever achieve the optimal rate. In fact, they can be arbitrarily far from optimal, even in very simple cases. This is a disturbing result because these techniques are standard tools that are widely used for sequential optimisation, for example, generalised linear bandits and reinforcement learning.

BibTeX-Schlüssel: LaSze17:AISTATS
Eintragstyp: inproceedings
Buchtitel: AISTATS
Jahr: 2017
Monat: April
Seiten: 728--737
Reihe: PMLR
Band: 54
date-added: 2017-04-15 17:42:26 +0000
pdf: papers/linbandits_aistats17.pdf
date-modified: 2017-04-15 17:46:08 +0000

BibSonomy

The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits (long version)

Zusammenfassung

Tags

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf