Oral: PipeMare: Asynchronous Pipeline Parallel DNN Training

Apr 4, 2021

Speakers

About

Pipeline parallelism when training neural networks enables models to be partitioned spatially, which can lead to overall higher hardware utilization. Unfortunately, to preserve the statistical efficiency of sequential training, existing pipeline parallel training techniques sacrifice hardware efficiency by decreasing pipeline utilization or incurring extra memory costs. In this paper, we investigate to what extent these sacrifices will be necessary on the emerging class of new dataflow hardware accelerators. We devise PipeMare, a simple yet robust training method that tolerates asynchronous updates during pipeline parallel execution without sacrificing utilization or memory, which allows efficient use of fine-grained pipeline parallelism. Concretely, when tested on ResNet and Transformer networks, asynchrony enables PipeMare to use up to 2.7x less memory or get 14.3x higher pipeline utilization, with similar model quality, when compared to state-of-the-art synchronous pipeline parallel training techniques.

Organizer

Categories

About MLSys 2021

The Conference on Machine Learning and Systems targets research at the intersection of machine learning and systems. The conference aims to elicit new connections amongst these fields, including identifying best practices and design principles for learning systems, as well as developing novel learning methods and theory tailored to practical machine learning workflows.

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow MLSys 2021