Jul 12, 2020
Value function learning remains a critical component of many reinforcement learning systems. Many algorithms are based on temporal difference (TD) updates, which have well-documented divergence issues, even though potentially sound alternatives exist like Gradient TD. Unsound approaches like Q-learning and TD remain popular because divergence seems rare in practice and these algorithms typically perform well. However, recent work with large neural network learning systems reveals that instability is more common than previously thought. Practitioners face a difficult dilemma: choose an easy to use and performant TD method, or a more complex algorithm that is more sound but harder to tune, less sample efficient, and underexplored with control. In this paper, we introduce a new method called TD with Regularized Corrections (TDRC), that attempts to balance ease of use, soundness, and performance. It behaves as well as TD, when TD performs well, but is sound even in cases where TD diverges. We characterize the expected update for TDRC, and show that it inherits soundness guarantees from Gradient TD, and converges to the same solution as TD. Empirically, TDRC exhibits good performance and low parameter sensitivity across several problems.
The International Conference on Machine Learning (ICML) is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence known as machine learning. ICML is globally renowned for presenting and publishing cutting-edge research on all aspects of machine learning used in closely related areas like artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, and robotics. ICML is one of the fastest growing artificial intelligence conferences in the world. Participants at ICML span a wide range of backgrounds, from academic and industrial researchers, to entrepreneurs and engineers, to graduate students and postdocs.
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Presentations on similar topic, category or speaker