Abstract
Following work on designing optimal rewards for single
agents, we define a multiagent optimal rewards problem
(ORP) in cooperative (specifically, common-payoff or
team) settings. This new problem solves for individual
agent reward functions that guide agents to better
overall team performance relative to teams in which all
agents guide their behavior with the same given
team-reward function. We present a multiagent
architecture in which each agent learns good reward
functions from experience using a gradient-based
algorithm in addition to performing the usual task of
planning good policies (except in this case with
respect to the learned rather than the given reward
function). Multiagency introduces the challenge of
nonstationarity: because the agents learn
simultaneously, each agent's reward-learning problem is
nonstationary and interdependent on the other agents
evolving reward functions. We demonstrate on two simple
domains that the proposed architecture outperforms the
conventional approach in which all the agents use the
same given team-reward function (even when accounting
for the resource overhead of the reward learning); that
the learning algorithm performs stably despite the
nonstationarity; and that learning individual reward
functions can lead to better specialization of roles
than is possible with shared reward, whether learned or
given.
Users
Please
log in to take part in the discussion (add own reviews or comments).