On the Convergence of Policy Gradient Methods to Nash equilibria in General Stochastic Games

Nov 28, 2022



Multi-agent learning in stochastic N-player games is a notoriously difficult problem because, in addition to their changing strategic decisions, the players of the game must also contend with the fact that the game itself evolves over time, possibly in a very complicated manner. Because of this, the equilibrium convergence properties of popular learning algorithms — like policy gradient and its variants — are poorly understood, except in specific classes of games (such as potential or two-player, zero-sum games). In view of all this, we examine the long-run behavior of policy gradient methods with respect to Nash equilibrium policies that are second-order stationary (SOS) in a sense similar to the type of KKT sufficiency conditions used in optimization. Our analysis shows that SOS policies are locally attracting with high probability, and we show that policy gradient trajectories with gradient estimates provided by the REINFORCE algorithm achieve an 𝒪(1/√(n)) convergence rate to such equilibria if the method's step-size is chosen appropriately. On the other hand, when the equilibrium in question is deterministic, we show that this rate can be improved dramatically and, in fact, policy gradient methods converge within a finite number of iterations in that case.


Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%


Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow NeurIPS 2022