A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Dec 10, 2023

Speakers

About

In this work, we study a natural variant of best-response learning dynamics in two-player zero-sum stochastic games. Specifically, we develop a doubly smoothed best-response dynamics, which combines a discrete and smoothed variant of the best-response dynamics with temporal-difference (TD)-learning and minimax value iteration. The resulting dynamics are payoff-based, convergent, rational, and symmetric among players. We provide the first finite-sample analysis of such payoff-based best-response-type independent learning dynamics for zero-sum stochastic games. Our analysis uses a novel coupled Lyapunov drift approach to capture the evolution of multiple sets of coupled and stochastic iterates, which might be of independent interest.

Organizer

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow NeurIPS 2023