On the Convergence of Policy Gradient Methods to Nash equilibria in General Stochastic Games

Nov 28, 2022

Sprecher:innen

Über

Multi-agent learning in stochastic N-player games is a notoriously difficult problem because, in addition to their changing strategic decisions, the players of the game must also contend with the fact that the game itself evolves over time, possibly in a very complicated manner. Because of this, the equilibrium convergence properties of popular learning algorithms — like policy gradient and its variants — are poorly understood, except in specific classes of games (such as potential or two-player, zero-sum games). In view of all this, we examine the long-run behavior of policy gradient methods with respect to Nash equilibrium policies that are second-order stationary (SOS) in a sense similar to the type of KKT sufficiency conditions used in optimization. Our analysis shows that SOS policies are locally attracting with high probability, and we show that policy gradient trajectories with gradient estimates provided by the REINFORCE algorithm achieve an 𝒪(1/√(n)) convergence rate to such equilibria if the method's step-size is chosen appropriately. On the other hand, when the equilibrium in question is deterministic, we show that this rate can be improved dramatically and, in fact, policy gradient methods converge within a finite number of iterations in that case.

Organisator

Präsentation speichern

Soll diese Präsentation für 1000 Jahre gespeichert werden?

Wie speichern wir Präsentationen?

Ewigspeicher-Fortschrittswert: 0 = 0.0%

Freigeben

Empfohlene Videos

Präsentationen, deren Thema, Kategorie oder Sprecher:in ähnlich sind

Interessiert an Vorträgen wie diesem? NeurIPS 2022 folgen