Jul 24, 2023
Speaker · 0 followers
Speaker · 0 followers
Speaker · 0 followers
Reinforcement learning algorithms require a long time to learn policies on complex tasks due to the need for a large amount of training data. With the recent advances in GPU-based simulation, such as Isaac Gym, data collection has been sped up thousands of times on a commodity GPU. Most prior works have used on-policy methods such as PPO to train policies due to their simplicity and easy-to-scale nature. Off-policy methods are usually more sample-efficient but more challenging to be scaled up, resulting in a much longer wall-clock training time in practice. In this work, we present a novel Parallel Q-Learning (PQL) scheme that is substantially faster in wall-clock time and achieves better sample efficiency than PPO. The key enabling factor is parallelizing the data collection, policy function learning, and value function learning. Different from prior works on distributed off-policy learning, such as Apex, our scheme is designed specifically for massively parallel GPU-based simulation and optimized to work on a single workstation. We demonstrate the capability of scaling up Q-learning methods to tens of thousands of parallel environments. We also investigate important factors that can affect policy learning speed, including the number of parallel environments, exploration schemes, batch size, GPU models, etc.Reinforcement learning algorithms require a long time to learn policies on complex tasks due to the need for a large amount of training data. With the recent advances in GPU-based simulation, such as Isaac Gym, data collection has been sped up thousands of times on a commodity GPU. Most prior works have used on-policy methods such as PPO to train policies due to their simplicity and easy-to-scale nature. Off-policy methods are usually more sample-efficient but more challenging to be scaled up, r…
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker
Aviv Navon, …
Zhanfeng Mo, …
Linwei Tao, …