Oral: Wavelet: Efficient DNN Training with Tick-Tock Scheduling

Apr 4, 2021

Speakers

About

DNNs have revolutionized across a wide range of applications, such as image classification, speech recognition and robotics control. As DNN models becoming more computationally expensive to train, parallel execution with multiple accelerators (e.g. GPUs) is adopted. However, as computation power increasing, GPUs are under-utilized mainly due to limited local memory size. To address this memory bound, we present Wavelet, an efficient and generic approach that can fully utilize all the available on-device memory among GPUs involved in the distributed training job. Wavelet achieves near optimal on-device memory usage by adopting a simple scheduling scheme called Tick-Tock, which interleaves waves of peak memory usage among the accelerators. Evaluations on a variety of DNN models and tasks show that, Wavelet trains models up to 6.7x faster than commonly used parallelism techniques.

Organizer

Categories

About MLSys 2021

The Conference on Machine Learning and Systems targets research at the intersection of machine learning and systems. The conference aims to elicit new connections amongst these fields, including identifying best practices and design principles for learning systems, as well as developing novel learning methods and theory tailored to practical machine learning workflows.

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow MLSys 2021