Ahmed M. Abdelmoniem, Ahmed Elzanaty, Mohamed-Slim Alouini, Marco Canini · Oral: SIDCo: An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems · SlidesLive

Categories

EN

Log in Get an estimate

Oral: SIDCo: An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems

Apr 4, 2021

Speakers

About

The recent many-fold increase in the size of deep neural networks makes efficient distributed training challenging. Many proposals exploit the compressibility of the gradients and propose lossy compression techniques to speed up the communication stage of distributed training. Nevertheless, compression comes at the cost of reduced model quality and extra computation overhead. In this work, we design an efficient compressor with minimal overhead. Noting the sparsity of the gradients, we propose to model the gradients as random variables distributed according to some sparsity-inducing distributions (SIDs). We empirically validate our assumption by studying the statistical characteristics of the evolution of gradient vectors over the training process. We then propose Sparsity-Inducing Distribution-based Compression (SIDCo), a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC) while being faster by imposing lower compression overhead. Our extensive evaluation of popular machine learning benchmarks involving both recurrent neural network (RNN) and convolution neural network (CNN) models shows that SIDCo speeds up training by up to ~41.7X, 7.6X, and 1.9X compared to the no-compression baseline, Topk, and DGC compressors, respectively.

Organizer

Categories

About MLSys 2021

The Conference on Machine Learning and Systems targets research at the intersection of machine learning and systems. The conference aims to elicit new connections amongst these fields, including identifying best practices and design principles for learning systems, as well as developing novel learning methods and theory tailored to practical machine learning workflows.

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Sharing

Recommended Videos

Presentations on similar topic, category or speaker