Аннотация
The number of proposed reinforcement learning algorithms appears to be ever-growing. This article tackles the diversification by showing a persistent principle in several independent reinforcement learning algorithms that have been applied to multi-agent settings. While their learning structure may look very diverse, algorithms such as Gradient Ascent, Cross learning, variations of Q-learning and Regret minimization all follow the same basic pattern. Variations of Gradient Ascent can be described by the projection dynamics and the other algorithms follow the replicator dynamics. In combination with some modulations of the learning rate and deviations for the sake of exploration, they are primarily different implementations of learning in the direction of the reinforcement gradient.
Пользователи данного ресурса
Пожалуйста,
войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)