James Gleeson, Srivatsan Krishnan, Moshe Gabel, Vijay Janapa Reddi, Eyal De Lara, Gennady Pekhimenko · Oral: RL-Scope: Cross-stack Profiling for Deep Reinforcement Learning Workloads · SlidesLive

Categories

EN

Log in Get an estimate

Oral: RL-Scope: Cross-stack Profiling for Deep Reinforcement Learning Workloads

Apr 4, 2021

Speakers

About

In recent years, deep reinforcement learning (RL) has demonstrated groundbreaking results in robotics, datacenter management, and many other applications. Despite its increasing popularity, there has been little work in understanding system-level bottlenecks in RL workloads. Instead, the common implicit assumption is that RL workloads are similar to classic supervised learning (SL) workloads. Our analysis contradicts this assumption and shows operations considered GPU-heavy in SL spend at most 12.9% of time GPU-bound in RL workloads, with the rest CPU-bound in different layers of the software stack running high-level language code and non-compute code such as ML backend and CUDA API calls. To explain where training time is spent in RL workloads, we propose RL-Scope: an accurate cross-stack profiler that supports multiple ML backends and simulators. In contrast to existing profilers that are limited to a single layer of the software and hardware stack, RL-Scope collects profiling information across the entire stack and scopes it to high-level operations, providing developers and researchers with a complete picture of RL training time. We demonstrate RL-Scope's utility through several in-depth case studies. First, we compare RL frameworks to quantify the effects of fundamental design choices behind ML backends. For example, we use RL-Scope to measure and explain a 2.3× difference in runtime between equivalent PyTorch and TensorFlow algorithm implementations, and to identify a bottleneck rooted in overly-abstracted algorithm implementations. Next, we survey how training bottlenecks change as we consider different simulators and RL algorithms, and show that on-policy algorithms are at least 3.5× more simulation-bound than off-policy algorithms. Finally, we profile a scale-up workload and demonstrate that GPU utilization metrics reported by commonly-used tools dramatically inflate GPU usage, whereas RL-Scope reports true GPU-bound time.

Organizer

Categories

About MLSys 2021

The Conference on Machine Learning and Systems targets research at the intersection of machine learning and systems. The conference aims to elicit new connections amongst these fields, including identifying best practices and design principles for learning systems, as well as developing novel learning methods and theory tailored to practical machine learning workflows.

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Sharing

Recommended Videos

Presentations on similar topic, category or speaker