Takuya Hiraoka, Takahisa Imagawa, Voot Tangkaratt, Takayuki Osa, Takashi Onishi, Yoshimasa Tsuruoka · Meta-Model-Based Meta-Policy Optimization · SlidesLive

Kategorie

CS

Přihlásit se Nezávazná poptávka

Meta-Model-Based Meta-Policy Optimization

17. Listopad 2021

Řečníci

O prezentaci

Model-based meta-reinforcement learning (RL) methods have recently shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.

Organizátor

O organizátorovi (ACML 2021)

The 13th Asian Conference on Machine Learning ACML 2021 aims to provide a leading international forum for researchers in machine learning and related fields to share their new ideas, progress and achievements.

Uložení prezentace

Měla by být tato prezentace uložena po dobu 1000 let?

Jak ukládáme prezentace

Sdílení

Doporučená videa

Prezentace na podobné téma, kategorii nebo přednášejícího