The ability for policies to generalize to new environments is key to the
broad application of RL agents. A promising approach to prevent an agent's
policy from overfitting to a limited set of training environments is to apply
regularization techniques originally developed for supervised learning.
However, there are stark differences between supervised learning and RL. We
discuss those differences and propose modifications to existing regularization
techniques in order to better adapt them to RL. In particular, we focus on
regularization techniques relying on the injection of noise into the learned
function, a family that includes some of the most widely used approaches such
as Dropout and Batch Normalization. To adapt them to RL, we propose Selective
Noise Injection (SNI), which maintains the regularizing effect the injected
noise has, while mitigating the adverse effects it has on the gradient quality.
Furthermore, we demonstrate that the Information Bottleneck (IB) is a
particularly well suited regularization technique for RL as it is effective in
the low-data regime encountered early on in training RL agents. Combining the
IB with SNI, we significantly outperform current state of the art results,
including on the recently proposed generalization benchmark Coinrun.