A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning

Dez 2, 2022

Sprecher:innen

Über

As with any machine learning problem with limited data, effective offline RL algorithms require careful regularization to avoid overfitting. One-step methods perform regularization by doing just a single step of policy improvement, while critic regularization methods do many steps of policy improvement with a regularized objective. These methods appear distinct. One-step methods, such as advantage-weighted regression and conditional behavioral cloning, are simple and stable. Critic regularization is more challenging to implement correctly and typically requires more compute, but has appealing lower-bound guarantees. Empirically, prior work alternates between claiming better results with one-step RL and critic regularization. In this paper, we draw a close connection between these methods: applying a multi-step critic regularization method with a large regularization coefficient yields the same policy as one-step RL. Practical implementations violate our assumptions and critic regularization is typically applied with small regularization coefficients. Nonetheless, our experiments nevertheless show that our analysis makes accurate, testable predictions about practical offline RL methods (CQL and one-step RL) with commonly-used hyperparameters.

Organisator

Präsentation speichern

Soll diese Präsentation für 1000 Jahre gespeichert werden?

Wie speichern wir Präsentationen?

Ewigspeicher-Fortschrittswert: 0 = 0.0%

Freigeben

Empfohlene Videos

Präsentationen, deren Thema, Kategorie oder Sprecher:in ähnlich sind

Interessiert an Vorträgen wie diesem? NeurIPS 2022 folgen