Jul 12, 2020
We make three contributions toward better understanding policy gradient methods. First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a O(1/t) rate, with constants depending on the problem and initialization. This result significantly improves recent asymptotic convergence results. The analysis relies on two findings: that the softmax policy gradient satisfies a Łojasiewicz inequality, and the minimum probability of an optimal action during optimization can be bounded in terms of its initial value. Second, we analyze entropy regularized policy gradient and show that in the one state (bandit) case it enjoys a linear convergence rate O(e^-t), while for general MDPs we prove that it converges at a O(1/t) rate. This result resolves an open question in the recent literature. A key insight is that the entropy regularized gradient update behaves similarly to the contraction operator in value learning, with contraction factor depending on current policy. Finally, combining the above two results and additional lower bound results, we explain how entropy regularization improves policy optimization, even with the true gradient, from the perspective of convergence rate. These results provide a theoretical understanding of the impact of entropy and corroborate existing empirical studies.
The International Conference on Machine Learning (ICML) is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence known as machine learning. ICML is globally renowned for presenting and publishing cutting-edge research on all aspects of machine learning used in closely related areas like artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, and robotics. ICML is one of the fastest growing artificial intelligence conferences in the world. Participants at ICML span a wide range of backgrounds, from academic and industrial researchers, to entrepreneurs and engineers, to graduate students and postdocs.
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Presentations on similar topic, category or speaker