Abstract
Traditional Reinforcement Learning (RL) algorithms either predict rewards
with value functions or maximize them using policy search. We study an
alternative: Upside-Down Reinforcement Learning (Upside-Down RL or UDRL), that
solves RL problems primarily using supervised learning techniques. Many of its
main principles are outlined in a companion report 34. Here we present the
first concrete implementation of UDRL and demonstrate its feasibility on
certain episodic learning problems. Experimental results show that its
performance can be surprisingly competitive with, and even exceed that of
traditional baseline algorithms developed over decades of research.
Users
Please
log in to take part in the discussion (add own reviews or comments).