Meta-Model-Based Meta-Policy Optimization

Abstract

Model-based reinforcement learning (MBRL) has been applied to meta-learning settings and demonstrated its high sample efficiency. However, in previous MBRL for meta-learning settings, policies are optimized via rollouts that fully rely on a predictive model for an environment, and thus its performance in a real environment tends to degrade when the predictive model is inaccurate. In this paper, we prove that the performance degradation can be suppressed by using branched meta-rollouts. Based on this theoretical analysis, we propose meta-model-based meta-policy optimization (M3PO), in which the branched meta-rollouts are used for policy optimization. We demonstrate that M3PO outperforms existing meta reinforcement learning methods in continuous-control benchmarks.

BibTeX key: hiraoka2020metamodelbased
entry type: article
year: 2020
url: http://arxiv.org/abs/2006.02608
note: cite arxiv:2006.02608

BibSonomy

Meta-Model-Based Meta-Policy Optimization

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on