Inproceedings,

Compressed Conditional Mean Embeddings for Model-Based Reinforcement Learning

G. Lever, J. Shawe-Taylor, R. Stafford, and {. Szepesvári.
AAAI-2016, page 1779--1787. (November 2016)

Abstract

We present a model-based approach to solving Markov decision processes (MDPs) in which the system dynamics are learned using conditional mean embeddings (CMEs). This class of methods comes with strong performance guarantees, and enables planning to be performed in an induced finite (pseudo-)MDP, which approximates the MDP, but can be solved exactly using dynamic programming. Two drawbacks of existing methods exist: firstly, the size of the induced finite (pseudo-)MDP scales quadratically with the amount of data used to learn the model, costing much memory and time when planning with the learned model; secondly, learning the CME itself using powerful kernel least-squares is costly -- a second computational bottleneck. We present an algorithm which maintains a rich kernelized CME model class, but solves both problems: firstly we demonstrate that the loss function for the CME model suggests a principled approach to compressing the induced (pseudo-)MDP, leading to faster planning, while maintaining guarantees; secondly we propose to learn the CME model itself using fast sparse-greedy kernel regression well-suited to the RL context. We demonstrate superior performance to existing methods in this class of model-based approaches on a range of MDPs.

BibTeX key: LeSTSSz16
entry type: inproceedings
booktitle: AAAI-2016
year: 2016
month: November
pages: 1779--1787
pdf: papers/AAAI16_CompCME4RLfinal.pdf
date-modified: 2016-07-29 14:17:06 +0000
date-added: 2015-12-02 00:14:30 +0000

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@inproceedings{LeSTSSz16, abstract = {We present a model-based approach to solving Markov decision processes (MDPs) in which the system dynamics are learned using conditional mean embeddings (CMEs). This class of methods comes with strong performance guarantees, and enables planning to be performed in an induced finite (pseudo-)MDP, which approximates the MDP, but can be solved exactly using dynamic programming. Two drawbacks of existing methods exist: firstly, the size of the induced finite (pseudo-)MDP scales quadratically with the amount of data used to learn the model, costing much memory and time when planning with the learned model; secondly, learning the CME itself using powerful kernel least-squares is costly -- a second computational bottleneck. We present an algorithm which maintains a rich kernelized CME model class, but solves both problems: firstly we demonstrate that the loss function for the CME model suggests a principled approach to compressing the induced (pseudo-)MDP, leading to faster planning, while maintaining guarantees; secondly we propose to learn the CME model itself using fast sparse-greedy kernel regression well-suited to the RL context. We demonstrate superior performance to existing methods in this class of model-based approaches on a range of MDPs.}, added-at = {2020-03-17T03:03:01.000+0100}, author = {Lever, G. and Shawe-Taylor, J. and Stafford, R. and Szepesv{\'a}ri, {Cs}.}, biburl = {https://www.bibsonomy.org/bibtex/22e41ac5076daf1a7476a5ddc6b629e91/csaba}, booktitle = {AAAI-2016}, date-added = {2015-12-02 00:14:30 +0000}, date-modified = {2016-07-29 14:17:06 +0000}, interhash = {51dc688610a971ab9af4a40afd3fd584}, intrahash = {2e41ac5076daf1a7476a5ddc6b629e91}, keywords = {Decision Markov Processes,function RL, abstraction, approximation, control control, learning, model-based planning, pseudo-MDPs reinforcement}, month = {November}, pages = {1779--1787}, pdf = {papers/AAAI16_CompCME4RLfinal.pdf}, timestamp = {2020-03-17T03:03:01.000+0100}, title = {Compressed Conditional Mean Embeddings for Model-Based Reinforcement Learning}, year = 2016 }

BibSonomy

Compressed Conditional Mean Embeddings for Model-Based Reinforcement Learning

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on