Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback

被引:0
|
作者
Zheng, Shuai [1 ,2 ]
Huang, Ziyue [1 ]
Kwok, James T. [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Amazon Web Serv, Seattle, WA 98109 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Communication overhead is a major bottleneck hampering the scalability of distributed machine learning systems. Recently, there has been a surge of interest in using gradient compression to improve the communication efficiency of distributed neural network training. Using 1-bit quantization, signSGD with majority vote achieves a 32x reduction on communication cost. However, its convergence is based on unrealistic assumptions and can diverge in practice. In this paper, we propose a general distributed compressed SGD with Nesterov's momentum. We consider two-way compression, which compresses the gradients both to and from workers. Convergence analysis on nonconvex problems for general gradient compressors is provided. By partitioning the gradient into blocks, a blockwise compressor is introduced such that each gradient block is compressed and transmitted in 1-bit format with a scaling factor, leading to a nearly 32x reduction on communication. Experimental results show that the proposed method converges as fast as full-precision distributed momentum SGD and achieves the same testing accuracy. In particular, on distributed ResNet training with 7 workers on the ImageNet, the proposed algorithm achieves the same testing accuracy as momentum SGD using full-precision gradients, but with 46% less wall clock time.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices
    Ryabinin, Max
    Gorbunov, Eduard
    Plokhotnyuk, Vsevolod
    Pekhimenko, Gennady
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [42] Communication-efficient local SGD with age-based worker selection
    Feng Zhu
    Jingjing Zhang
    Xin Wang
    The Journal of Supercomputing, 2023, 79 : 13794 - 13816
  • [43] Efficient-Adam: Communication-Efficient Distributed Adam
    Chen, Congliang
    Shen, Li
    Liu, Wei
    Luo, Zhi-Quan
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2023, 71 : 3257 - 3266
  • [44] Double Quantization for Communication-Efficient Distributed Optimization
    Yu, Yue
    Wu, Jiaxiang
    Huang, Longbo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [45] Towards Communication-Efficient Distributed Background Subtraction
    Hung Ngoc Phan
    Synh Viet-Uyen Ha
    Phuong Hoai Ha
    RECENT CHALLENGES IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, 2022, 1716 : 490 - 502
  • [46] Communication-Efficient Distributed Mining of Association Rules
    Assaf Schuster
    Ran Wolff
    Data Mining and Knowledge Discovery, 2004, 8 : 171 - 196
  • [47] Communication-Efficient Distributed PCA by Riemannian Optimization
    Huang, Long-Kai
    Pan, Sinno Jialin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [48] Double Quantization for Communication-Efficient Distributed Optimization
    Huang, Longbo
    PROCEEDINGS OF THE 13TH EAI INTERNATIONAL CONFERENCE ON PERFORMANCE EVALUATION METHODOLOGIES AND TOOLS ( VALUETOOLS 2020), 2020, : 2 - 2
  • [49] Communication-Efficient Distributed Dual Coordinate Ascent
    Jaggi, Martin
    Smith, Virginia
    Takac, Martin
    Terhorst, Jonathan
    Krishnan, Sanjay
    Hofmann, Thomas
    Jordan, Michael, I
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [50] Communication-Efficient Distributed Optimization with Quantized Preconditioners
    Alimisis, Foivos
    Davies, Peter
    Alistarh, Dan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139