Zusammenfassung
Deep Q-learning is known to suffer from overestimation of action values, due to the maximization operation when computing the target values. Such overestimation can lead to substantial degradation of reward performance. In this work, we introduce a simple method based on DQN, named as Deep Value Q-learning, which regulates the estimation of action values and effectively tackles over- and underestimation. We evaluate our method on Atari-100k benchmark and demonstrate that DVQN consistently outperforms
Deep Q-learning, Deep Double Q-learning and Clipped Deep Double Q-learning in terms of reward performance. Moreover, our experimental results show that DVQN serves as a better backbone network than DQN, when combined with an additional representation learning objective.
Nutzer