Replay-Guided Adversarial Environment Design

6. Prosinec 2021

Řečníci

O prezentaci

Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training based on an agent's policy, leading to the emergence of diverse training environments. Here, we cast Prioritized Level Replay (PLR), an empirically successful but theoretically unmotivated method which can be used to choose from a set of levels, as a type of UED. We argue that through curating completely random levels, PLR, too, can generate novel and complex levels for effective training. This insight reveals a natural class of UED methods we call Dual Curriculum Design (DCD). Crucially, DCD includes both PLR and a popular UED algorithm, PAIRED, as special cases and inherits similar theoretical guarantees. This connection allows us to develop novel theory for PLR, providing a version with a robustness guarantee. Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria. Indeed, our experiments confirm that our new method, PLR^, obtains better results on a suite of zero-shot out-of-distribution generalization tasks, in addition to demonstrating that PLR improves the performance of PAIRED, from which it inherited its theoretical framework.

Organizátor

O organizátorovi (NeurIPS 2021)

Neural Information Processing Systems (NeurIPS) is a multi-track machine learning and computational neuroscience conference that includes invited talks, demonstrations, symposia and oral and poster presentations of refereed papers. Following the conference, there are workshops which provide a less formal setting.

Uložení prezentace

Měla by být tato prezentace uložena po dobu 1000 let?

Jak ukládáme prezentace

Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %

Sdílení

Doporučená videa

Prezentace na podobné téma, kategorii nebo přednášejícího

Zajímají Vás podobná videa? Sledujte NeurIPS 2021