Abstract
This article investigates the performance of independent reinforcement learners in multi- agent games. Convergence to Nash equilibria and parameter settings for desired learning be- havior are discussed for Q-learning, Frequency Maximum Q value (FMQ) learning and lenient Q-learning. FMQ and lenient Q-learning are shown to outperform regular Q-learning significantly in the context of coordination games with mis- coordination penalties. Furthermore, Q- learning with an $\epsilon$-greedy and FMQ learning with a Boltzmann action selection are shown to scale well to games with one thousand agents.
Users
Please
log in to take part in the discussion (add own reviews or comments).