Nov 17, 2021
Model-based meta-reinforcement learning (RL) methods have recently shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.
The 13th Asian Conference on Machine Learning ACML 2021 aims to provide a leading international forum for researchers in machine learning and related fields to share their new ideas, progress and achievements.
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker