Bias-Optimal Incremental Learning of Control Sequences
for Virtual Robots
J. Schmidhuber, V. Zhumatiy, and M. Gagliolo. Procceedings of the eigth conference on Intelligent
Autonomous Systems, IAS-8, page 658--665. Amsterdam, (2004)
Abstract
Learning and planning control is hard. The search
space of traditional planners consists of sequences of
primitive actions. To exploit reusable subsequences and
other algorithmic regularities, however, we should
instead search the general space of programs that
compute action sequences. Such programs may invoke very
fast thinking actions consuming only nanoseconds (such
as conditional jumps to certain code addresses) as well
as very slow control actions consuming seconds in the
real world (such as
stretch-arm-until-obstacle-sensation). What is an
optimal way of allocating time to tests of such
non-homogeneous programs? What is an optimal way of
reusing experience with previous tasks to learn
solutions to new tasks? One answer is given by the
recent Optimal Ordered Problem Solver OOPS, a
near-bias-optimal incremental extension of Levin's
nonincremental universal search, which we apply to
virtual robotics for the first time: our snake robot
uses OOPS to learn to walk and jump in a partially
observable environment (POMDP) with a huge state/action
space.
%0 Conference Paper
%1 schmidhuber:2004:IAS
%A Schmidhuber, Juergen
%A Zhumatiy, Viktor P
%A Gagliolo, Matteo
%B Procceedings of the eigth conference on Intelligent
Autonomous Systems, IAS-8
%C Amsterdam
%D 2004
%K algorithms, genetic programming
%P 658--665
%T Bias-Optimal Incremental Learning of Control Sequences
for Virtual Robots
%U ftp://ftp.idsia.ch/pub/juergen/snakeias.pdf
%X Learning and planning control is hard. The search
space of traditional planners consists of sequences of
primitive actions. To exploit reusable subsequences and
other algorithmic regularities, however, we should
instead search the general space of programs that
compute action sequences. Such programs may invoke very
fast thinking actions consuming only nanoseconds (such
as conditional jumps to certain code addresses) as well
as very slow control actions consuming seconds in the
real world (such as
stretch-arm-until-obstacle-sensation). What is an
optimal way of allocating time to tests of such
non-homogeneous programs? What is an optimal way of
reusing experience with previous tasks to learn
solutions to new tasks? One answer is given by the
recent Optimal Ordered Problem Solver OOPS, a
near-bias-optimal incremental extension of Levin's
nonincremental universal search, which we apply to
virtual robotics for the first time: our snake robot
uses OOPS to learn to walk and jump in a partially
observable environment (POMDP) with a huge state/action
space.
@inproceedings{schmidhuber:2004:IAS,
abstract = {Learning and planning control is hard. The search
space of traditional planners consists of sequences of
primitive actions. To exploit reusable subsequences and
other algorithmic regularities, however, we should
instead search the general space of programs that
compute action sequences. Such programs may invoke very
fast thinking actions consuming only nanoseconds (such
as conditional jumps to certain code addresses) as well
as very slow control actions consuming seconds in the
real world (such as
stretch-arm-until-obstacle-sensation). What is an
optimal way of allocating time to tests of such
non-homogeneous programs? What is an optimal way of
reusing experience with previous tasks to learn
solutions to new tasks? One answer is given by the
recent Optimal Ordered Problem Solver OOPS, a
near-bias-optimal incremental extension of Levin's
nonincremental universal search, which we apply to
virtual robotics for the first time: our snake robot
uses OOPS to learn to walk and jump in a partially
observable environment (POMDP) with a huge state/action
space.},
added-at = {2008-06-19T17:46:40.000+0200},
address = {Amsterdam},
author = {Schmidhuber, Juergen and Zhumatiy, Viktor P and Gagliolo, Matteo},
biburl = {https://www.bibsonomy.org/bibtex/26c34b247a057ecda130cabcad44ad30c/brazovayeye},
booktitle = {Procceedings of the eigth conference on Intelligent
Autonomous Systems, IAS-8},
interhash = {44478e4530d185f60a60bf397093f70a},
intrahash = {6c34b247a057ecda130cabcad44ad30c},
keywords = {algorithms, genetic programming},
pages = {658--665},
timestamp = {2008-06-19T17:51:06.000+0200},
title = {Bias-Optimal Incremental Learning of Control Sequences
for Virtual Robots},
url = {ftp://ftp.idsia.ch/pub/juergen/snakeias.pdf},
year = 2004
}