Dec 10, 2023
In this work, we study a natural variant of best-response learning dynamics in two-player zero-sum stochastic games. Specifically, we develop a doubly smoothed best-response dynamics, which combines a discrete and smoothed variant of the best-response dynamics with temporal-difference (TD)-learning and minimax value iteration. The resulting dynamics are payoff-based, convergent, rational, and symmetric among players. We provide the first finite-sample analysis of such payoff-based best-response-type independent learning dynamics for zero-sum stochastic games. Our analysis uses a novel coupled Lyapunov drift approach to capture the evolution of multiple sets of coupled and stochastic iterates, which might be of independent interest.
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Presentations on similar topic, category or speaker