R. Sałustowicz, M. Wiering, and J. Schmidhuber. Proceedings of the Seventh International Conference on
Artificial Neural Networks (ICANN'97), volume 1327 of Lecture Notes in Computer Science, page 769--774. Springer-Verlag, (1997)
Abstract
We use simulated soccer to study multiagent learning.
Each team's players (agents) share action set and
policy but may behave differently due to
position-dependent inputs. All agents making up a team
are rewarded or punished collectively in case of goals.
We conduct simulations with varying team sizes, and
compare two learning algorithms: TD-Q learning with
linear neural networks (TD-Q) and Probabilistic
Incremental Program Evolution (PIPE). TD-Q is based on
evaluation functions (EFs) mapping input/action pairs
to expected reward, while PIPE searches policy space
directly. PIPE uses an adaptive probability
distribution to synthesize programs that calculate
action probabilities from current inputs. Our results
show that TD-Q has difficulties to learn appropriate
shared EFs. PIPE, however, does not depend on EFs and
finds good policies faster and more reliably.
%0 Conference Paper
%1 Salustowicz:97icann
%A Sałustowicz, R. P.
%A Wiering, M. A.
%A Schmidhuber, J.
%B Proceedings of the Seventh International Conference on
Artificial Neural Networks (ICANN'97)
%D 1997
%E Gerstner, W.
%E Germond, A.
%E Hasler, M.
%E Nicoud, J.-D.
%I Springer-Verlag
%K PIPE
%P 769--774
%T On Learning Soccer Strategies
%U ftp://ftp.idsia.ch/pub/rafal/ICANN_soccer.ps.gz
%V 1327
%X We use simulated soccer to study multiagent learning.
Each team's players (agents) share action set and
policy but may behave differently due to
position-dependent inputs. All agents making up a team
are rewarded or punished collectively in case of goals.
We conduct simulations with varying team sizes, and
compare two learning algorithms: TD-Q learning with
linear neural networks (TD-Q) and Probabilistic
Incremental Program Evolution (PIPE). TD-Q is based on
evaluation functions (EFs) mapping input/action pairs
to expected reward, while PIPE searches policy space
directly. PIPE uses an adaptive probability
distribution to synthesize programs that calculate
action probabilities from current inputs. Our results
show that TD-Q has difficulties to learn appropriate
shared EFs. PIPE, however, does not depend on EFs and
finds good policies faster and more reliably.
@inproceedings{Salustowicz:97icann,
abstract = {We use simulated soccer to study multiagent learning.
Each team's players (agents) share action set and
policy but may behave differently due to
position-dependent inputs. All agents making up a team
are rewarded or punished collectively in case of goals.
We conduct simulations with varying team sizes, and
compare two learning algorithms: TD-Q learning with
linear neural networks (TD-Q) and Probabilistic
Incremental Program Evolution (PIPE). TD-Q is based on
evaluation functions (EFs) mapping input/action pairs
to expected reward, while PIPE searches policy space
directly. PIPE uses an adaptive probability
distribution to synthesize programs that calculate
action probabilities from current inputs. Our results
show that TD-Q has difficulties to learn appropriate
shared EFs. PIPE, however, does not depend on EFs and
finds good policies faster and more reliably.},
added-at = {2008-06-19T17:46:40.000+0200},
author = {Sa\l{}ustowicz, R. P. and Wiering, M. A. and Schmidhuber, J.},
biburl = {https://www.bibsonomy.org/bibtex/2a3b38fa6ca54c62b7ecbf630fd5a0406/brazovayeye},
booktitle = {Proceedings of the Seventh International Conference on
Artificial Neural Networks (ICANN'97)},
editor = {Gerstner, W. and Germond, A. and Hasler, M. and Nicoud, J.-D.},
interhash = {91b54afa8105ccd2112294744717b818},
intrahash = {a3b38fa6ca54c62b7ecbf630fd5a0406},
keywords = {PIPE},
pages = {769--774},
publisher = {Springer-Verlag},
publisher_address = {Berlin Heidelberg},
series = {Lecture Notes in Computer Science},
size = {7 pages},
timestamp = {2008-06-19T17:50:57.000+0200},
title = {On Learning Soccer Strategies},
url = {ftp://ftp.idsia.ch/pub/rafal/ICANN_soccer.ps.gz},
volume = 1327,
year = 1997
}