Carlo Alfano, Rui Yuan, Patrick Rebeschini · A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence · SlidesLive

Categories

EN

Log in Get an estimate

A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence

Dec 10, 2023

Speakers

About

Modern policy optimization methods in reinforcement learning, such as Trust Region Policy Optimization and Proximal Policy Optimization, owe their success to the use of parameterized policies. However, while theoretical guarantees have been established for this class of algorithms, especially in the tabular setting, the use of general parametrization schemes remains mostly unjustified. In this work, we introduce a novel framework for policy optimization based on mirror descent that naturally accommodates general parametrizations. The policy class induced by our scheme recovers known classes, e.g. softmax, and generates new ones depending on the choice of mirror map. For our framework, we obtain the first result that guarantees linear convergence for a policy-gradient-based method involving general parametrization. To demonstrate the ability of our framework to accommodate general parametrization schemes, we obtain its sample complexity when using shallow neural networks and show that it represents an improvement upon the previous best results.

Organizer

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Sharing

Recommended Videos

Presentations on similar topic, category or speaker