Abstract
We introduce an exploration bonus for deep reinforcement learning methods
that is easy to implement and adds minimal overhead to the computation
performed. The bonus is the error of a neural network predicting features of
the observations given by a fixed randomly initialized neural network. We also
introduce a method to flexibly combine intrinsic and extrinsic rewards. We find
that the random network distillation (RND) bonus combined with this increased
flexibility enables significant progress on several hard exploration Atari
games. In particular we establish state of the art performance on Montezuma's
Revenge, a game famously difficult for deep reinforcement learning methods. To
the best of our knowledge, this is the first method that achieves better than
average human performance on this game without using demonstrations or having
access to the underlying state of the game, and occasionally completes the
first level.
Users
Please
log in to take part in the discussion (add own reviews or comments).