Variance Reduction in Off-Policy Deep Reinforcement Learning using Spectral Normalization

Dec 2, 2022

Speakers

About

Off-policy deep reinforcement learning algorithms like Soft Actor Critic (SAC) have achieved state-of-the-art results in several high dimensional continuous control tasks. Despite their success, they are prone to instability due to the deadly triad of off-policy training, function approximation, and bootstrapping. Unstable training of off-policy algorithms leads to sample inefficient and sub-optimal asymptotic performance, thus preventing their real-world deployment. To mitigate these issues, previously proposed solutions have focused on advances like target networks to alleviate instability and the introduction of twin critics to address overestimation bias. However, these modifications fail to address the issue of noisy gradient estimation with excessive variance, resulting in instability and slow convergence. Our proposed method, Spectral Normalized Actor Critic (SNAC), regularizes the actor and the critics using spectral normalization to systematically bound the gradient norm. Spectral normalization constrains the magnitudes of the gradients resulting in smoother actor-critics with robust and sample-efficient performance thus making them suitable for deployment in stability-critical and compute-constrained applications. We present empirical results on several challenging reinforcement learning benchmarks and extensive ablation studies to demonstrate the effectiveness of our proposed method.

Organizer

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow NeurIPS 2022