@csaba

Generalized Markov Decision Processes: Dynamic-programming and reinforcement-learning algorithms

, and . CS-96-11. Brown University, Department of Computer Science, Providence, RI, (November 1996)

Abstract

Reinforcement learning is the process by which an autonomous agent uses its experience interacting with an environment to improve its behavior. The Markov decision process (MDP) model is a popular way of formalizing the reinforcement-learning problem, but it is by no means the only way. In this paper, we show how many of the important theoretical results concerning reinforcement learning in MDPs extend to a generalized MDP model that includes MDPs, two-player games and MDPs under a worst-case optimality criterion as special cases. The basis of this extension is a stochastic-approximation theorem that reduces asynchronous convergence to synchronous convergence.

Links and resources

Tags