Advances in deep reinforcement learning have allowed autonomous agents to
perform well on Atari games, often outperforming humans, using only raw pixels
to make their decisions. However, most of these games take place in 2D
environments that are fully observable to the agent. In this paper, we present
the first architecture to tackle 3D environments in first-person shooter games,
that involve partially observable states. Typically, deep reinforcement
learning methods only utilize visual input for training. We present a method to
augment these models to exploit game feature information such as the presence
of enemies or items, during the training phase. Our model is trained to
simultaneously learn these features along with minimizing a Q-learning
objective, which is shown to dramatically improve the training speed and
performance of our agent. Our architecture is also modularized to allow
different models to be independently trained for different phases of the game.
We show that the proposed architecture substantially outperforms built-in AI
agents of the game as well as humans in deathmatch scenarios.