Improving the Performance of Distributed MXNet with RDMA

被引:0
|
作者
Mingfan Li
Ke Wen
Han Lin
Xu Jin
Zheng Wu
Hong An
Mengxian Chi
机构
[1] University of Science and Technology of China,
关键词
Distributed MXNet; Parameter server; RDMA; InfiniBand; Network optimization;
D O I
暂无
中图分类号
学科分类号
摘要
As one of the most influential deep learning frameworks, MXNet has achieved excellent performance and many breakthroughs in academic and industrial fields for various machine learning situations. The initial implementation of MXNet uses proxy-socket interface, which delivers suboptimal performance in distributed environment. In a massive parallel training task, parameters are updated frequently during each training loop, in which case network performance becomes the main factor of overall performance. Over the past decade, high performance interconnects have employed remote direct memory access (RDMA) technology to provide excellent performance for numerous scientific domains. In this paper, we describe an efficient design that extends the open-source MXNet to make it RDMA capable via RDMA-based parameter server interfaces. With modest optimizations towards memory usage and transmission overhead, RDMA-based MXNet achieves great performance improvement over the original software. Our experiments reveal that, for the communication subsystem of MXNet, the new design achieves 16x speedup (up to 21x at peak) over 1 Gigabit Ethernet (1GigE). For the two training cases on MXNet, the optimized implementation gains 5x and 9x speedup, respectively. Compared to experiments on the IP-over-InfiniBand (IPoIB) protocol, it achieves nearly 30% performance improvement, as well as better scalability and alleviation of bottlenecks.
引用
收藏
页码:467 / 480
页数:13
相关论文
共 50 条
  • [21] Design and Optimization of a Distributed File System Based on RDMA
    He, Qinlu
    Gao, Pengze
    Zhang, Fan
    Bian, Genqing
    Zhang, Weiqi
    Li, Zhen
    APPLIED SCIENCES-BASEL, 2023, 13 (15):
  • [22] Distributed Lock Management with RDMA: Decentralization without Starvation
    Yoon, Dong Young
    Chowdhury, Mosharaf
    Mozafari, Barzan
    SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 1571 - 1586
  • [23] High performance RDMA protocols in HPC
    Woodall, Tim S.
    Shipman, Galen M.
    Bosilca, George
    Graham, Richard L.
    Maccabe, Arthur B.
    RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, 2006, 4192 : 76 - 85
  • [24] Scalable RDMA performance in PGAS languages
    Farreras, Montse
    Almasi, George
    Cascaval, Calin
    Cortes, Toni
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 477 - +
  • [25] Improving performance of distributed Haskell in Mosix clusters
    Collins, L
    Gross, M
    Whitlock, PA
    COMPUTATIONAL SCIENCE - ICCS 2005, PT 3, 2005, 3516 : 983 - 986
  • [26] Improving Authentication Performance of Distributed SIP Proxies
    Dacosta, Italo
    Balasubramaniyan, Vijay
    Ahamad, Mustaque
    Traynor, Patrick
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2011, 22 (11) : 1804 - 1812
  • [27] Improving the performance of distributed virtual environment systems
    Morillo, P
    Orduña, JM
    Fernández, M
    Duato, J
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2005, 16 (07) : 637 - 649
  • [28] Improving Performance in Component Based Distributed Systems
    Al-Wesabi, Fahd N.
    Iskandar, Huda G.
    Ghilan, Mokhtar M.
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2019, 6 (22): : 1 - 7
  • [29] Improving performance in distributed embodied evolution: Distributed Differential Embodied Evolution
    Trueba, Pedro
    Prieto, Abraham
    2018 CONFERENCE ON ARTIFICIAL LIFE (ALIFE 2018), 2018, : 222 - 223
  • [30] A Distributed Persistent Memory File System Based on RDMA Multicast
    Chen M.
    Zheng S.
    You L.
    Wang J.
    Yan T.
    Tu Y.
    Han Y.
    Huang L.
    Zheng, Sheng'an (venero@tsinghua.edu.cn), 1600, Science Press (58): : 384 - 396