Inproceedings,

Bias-Optimal Incremental Learning of Control Sequences for Virtual Robots

, , and .
Procceedings of the eigth conference on Intelligent Autonomous Systems, IAS-8, page 658--665. Amsterdam, (2004)

Abstract

Learning and planning control is hard. The search space of traditional planners consists of sequences of primitive actions. To exploit reusable subsequences and other algorithmic regularities, however, we should instead search the general space of programs that compute action sequences. Such programs may invoke very fast thinking actions consuming only nanoseconds (such as conditional jumps to certain code addresses) as well as very slow control actions consuming seconds in the real world (such as stretch-arm-until-obstacle-sensation). What is an optimal way of allocating time to tests of such non-homogeneous programs? What is an optimal way of reusing experience with previous tasks to learn solutions to new tasks? One answer is given by the recent Optimal Ordered Problem Solver OOPS, a near-bias-optimal incremental extension of Levin's nonincremental universal search, which we apply to virtual robotics for the first time: our snake robot uses OOPS to learn to walk and jump in a partially observable environment (POMDP) with a huge state/action space.

Tags

Users

  • @schaul
  • @brazovayeye
  • @idsia

Comments and Reviews