Inproceedings,

The adversarial stochastic shortest path problem with unknown transition probabilities

G. Neu, A. György, and {. Szepesvári.
AISTATS, page 805--813. (April 2012)

Abstract

We consider online learning in a special class of episodic Markovian decision processes, namely, loop-free stochastic shortest path problems. In this problem, an agent has to traverse through a finite directed acyclic graph with random transitions while maximizing the obtained rewards along the way. We assume that the reward function can change arbitrarily between consecutive episodes, and is entirely revealed to the agent at the end of each episode. Previous work was concerned with the case when the stochastic dynamics is known ahead of time, whereas the main novelty of this paper is that this assumption is lifted. We propose an algorithm called "follow the perturbed optimistic policy" that combines ideas from the "follow the perturbed leader" method for online learning of arbitrary sequences and üpper confidence reinforcement learning", an algorithm for regret minimization in Markovian decision processes (with a fixed reward function). We prove that the expected cumulative regret of our algorithm is of order L|X| |A| T^1/2 up to logarithmic factors, where L is the length of the longest path in the graph, X is the state space, A is the action space and T is the number of episodes. To our knowledge this is the first algorithm that learns and controls stochastic and adversarial components in an online fashion at the same time.

BibTeX key: NeGySz12
entry type: inproceedings
booktitle: AISTATS
year: 2012
month: April
pages: 805--813
ee: http://jmlr.csail.mit.edu/proceedings/papers/v22/neu12.html
date-added: 2012-06-03 14:16:50 -0600
pdf: papers/neu12.pdf
bibsource: DBLP, http://dblp.uni-trier.de
date-modified: 2015-08-02 01:02:51 +0000

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@inproceedings{NeGySz12, abstract = {We consider online learning in a special class of episodic Markovian decision processes, namely, loop-free stochastic shortest path problems. In this problem, an agent has to traverse through a finite directed acyclic graph with random transitions while maximizing the obtained rewards along the way. We assume that the reward function can change arbitrarily between consecutive episodes, and is entirely revealed to the agent at the end of each episode. Previous work was concerned with the case when the stochastic dynamics is known ahead of time, whereas the main novelty of this paper is that this assumption is lifted. We propose an algorithm called "follow the perturbed optimistic policy" that combines ideas from the "follow the perturbed leader" method for online learning of arbitrary sequences and "upper confidence reinforcement learning", an algorithm for regret minimization in Markovian decision processes (with a fixed reward function). We prove that the expected cumulative regret of our algorithm is of order L|X| |A| T^{1/2} up to logarithmic factors, where L is the length of the longest path in the graph, X is the state space, A is the action space and T is the number of episodes. To our knowledge this is the first algorithm that learns and controls stochastic and adversarial components in an online fashion at the same time.}, added-at = {2020-03-17T03:03:01.000+0100}, author = {Neu, G. and Gy{\"o}rgy, A. and Szepesv{\'a}ri, {Cs}.}, bibsource = {DBLP, http://dblp.uni-trier.de}, biburl = {https://www.bibsonomy.org/bibtex/2db1f3381eb42e37ace651971ba589970/csaba}, booktitle = {AISTATS}, date-added = {2012-06-03 14:16:50 -0600}, date-modified = {2015-08-02 01:02:51 +0000}, ee = {http://jmlr.csail.mit.edu/proceedings/papers/v22/neu12.html}, interhash = {80caf19c9e2bb7801e43ee500d247d99}, intrahash = {db1f3381eb42e37ace651971ba589970}, keywords = {MDPs, adversarial finite learning, online path problem, setting, shortest theory}, month = {April}, pages = {805--813}, pdf = {papers/neu12.pdf}, timestamp = {2020-03-17T03:03:01.000+0100}, title = {The adversarial stochastic shortest path problem with unknown transition probabilities}, year = 2012 }

BibSonomy

The adversarial stochastic shortest path problem with unknown transition probabilities

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on