Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

Yujun Lin

Song Han

Huizi Mao

Yu Wang

William Dally

ICLR (2018)

Download Google Scholar

Abstract

Large-scale distributed training requires significant communication bandwidth to exchange gradients. The intensive gradient communication limits the scalability of multi-machine multi-GPU training, and requires expensive high-bandwidth network switches. In this paper, we discover that 99.9\% of the gradient exchange are redundant and can be safely removed without impacting the convergence accuracy. We propose "Deep Gradient Compression" that can efficiently save the communication bandwidth by up to 600$\times$ (after taking meta-data into account). We introduce four components of Deep Gradient Compression: momentum correction, local gradient clipping, momentum factor masking, and warm-up training that fully preserves the convergence accuracy. We extensively experimented Deep Gradient Compression on multiple types of machine learning tasks including image classification, speech recognition, and language modeling; and multiple datasets on Cifar10, ImageNet, Penn Treebank, and Librispeech Corpus. On all these scenarios, Deep Gradient Compression with only 0.1\% gradient exchange achieved the same accuracy and the same learning curves compared with the conventional dense update. With such techniques, we enable distributed training on the cheap commodity 1Gbps Ethernet.

Research Areas

Machine Intelligence

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities