Dec 6, 2021
Speaker · 0 followers
Speaker · 0 followers
Speaker · 0 followers
Cooperative multi-agent reinforcement learning (MARL) has received increasing attention in recent years and has found many scientific and engineering applications. However, a key challenge arising from many cooperative MARL algorithm designs (e.g., the actor-critic framework) is the policy evaluation problem, which can only be conducted in a decentralized fashion.In this paper, we focus on decentralized MARL policy evaluation with nonlinear function approximation, which is often seen in deep MARL. We first show that the empirical decentralized MARL policy evaluation problem can be reformulated as a decentralized nonconvex-strongly-concave minimax saddle point problem. We then develop a decentralized gradient-based descent ascent algorithm called GT-GDA that enjoys a convergence rate of 𝒪(1/T). To further reduce the sample complexity, we propose two decentralized stochastic optimization algorithms called GT-SRVR and GT-SRVRI, which enhance GT-GDA by variance reduction techniques. We show that all algorithms all enjoy an 𝒪(1/T) convergence rate to a stationary point of the reformulated minimax problem. Moreover, the fast convergence rates of GT-SRVR and GT-SRVRI imply 𝒪(ϵ^-2) communication complexity and 𝒪(m√(n)ϵ^-2) sample complexity, where m is the number of agents and n is the length of trajectories. To our knowledge, this paper is the first work that achieves both 𝒪(ϵ^-2) sample complexity and 𝒪(ϵ^-2) communication complexity in decentralized policy evaluation for cooperative MARL. Our extensive experiments also corroborate the theoretical performance of our proposed decentralized policy evaluation algorithms.Cooperative multi-agent reinforcement learning (MARL) has received increasing attention in recent years and has found many scientific and engineering applications. However, a key challenge arising from many cooperative MARL algorithm designs (e.g., the actor-critic framework) is the policy evaluation problem, which can only be conducted in a decentralized fashion.In this paper, we focus on decentralized MARL policy evaluation with nonlinear function approximation, which is often seen in deep MAR…
Account · 1.9k followers
Neural Information Processing Systems (NeurIPS) is a multi-track machine learning and computational neuroscience conference that includes invited talks, demonstrations, symposia and oral and poster presentations of refereed papers. Following the conference, there are workshops which provide a less formal setting.
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker
Liming Jiang, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Miklos Racz, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Luyao Yuan, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Sharon Zhou, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%