Abstract
The framework of reinforcement learning or optimal control provides a
mathematical formalization of intelligent decision making that is powerful and
broadly applicable. While the general form of the reinforcement learning
problem enables effective reasoning about uncertainty, the connection between
reinforcement learning and inference in probabilistic models is not immediately
obvious. However, such a connection has considerable value when it comes to
algorithm design: formalizing a problem as probabilistic inference in principle
allows us to bring to bear a wide array of approximate inference tools, extend
the model in flexible and powerful ways, and reason about compositionality and
partial observability. In this article, we will discuss how a generalization of
the reinforcement learning or optimal control problem, which is sometimes
termed maximum entropy reinforcement learning, is equivalent to exact
probabilistic inference in the case of deterministic dynamics, and variational
inference in the case of stochastic dynamics. We will present a detailed
derivation of this framework, overview prior work that has drawn on this and
related ideas to propose new reinforcement learning and control algorithms, and
describe perspectives on future research.
Users
Please
log in to take part in the discussion (add own reviews or comments).