Improving the Performance of Distributed MXNet with RDMA

被引:0
|
作者
Mingfan Li
Ke Wen
Han Lin
Xu Jin
Zheng Wu
Hong An
Mengxian Chi
机构
[1] University of Science and Technology of China,
关键词
Distributed MXNet; Parameter server; RDMA; InfiniBand; Network optimization;
D O I
暂无
中图分类号
学科分类号
摘要
As one of the most influential deep learning frameworks, MXNet has achieved excellent performance and many breakthroughs in academic and industrial fields for various machine learning situations. The initial implementation of MXNet uses proxy-socket interface, which delivers suboptimal performance in distributed environment. In a massive parallel training task, parameters are updated frequently during each training loop, in which case network performance becomes the main factor of overall performance. Over the past decade, high performance interconnects have employed remote direct memory access (RDMA) technology to provide excellent performance for numerous scientific domains. In this paper, we describe an efficient design that extends the open-source MXNet to make it RDMA capable via RDMA-based parameter server interfaces. With modest optimizations towards memory usage and transmission overhead, RDMA-based MXNet achieves great performance improvement over the original software. Our experiments reveal that, for the communication subsystem of MXNet, the new design achieves 16x speedup (up to 21x at peak) over 1 Gigabit Ethernet (1GigE). For the two training cases on MXNet, the optimized implementation gains 5x and 9x speedup, respectively. Compared to experiments on the IP-over-InfiniBand (IPoIB) protocol, it achieves nearly 30% performance improvement, as well as better scalability and alleviation of bottlenecks.
引用
收藏
页码:467 / 480
页数:13
相关论文
共 50 条
  • [1] Improving the Performance of Distributed MXNet with RDMA
    Li, Mingfan
    Wen, Ke
    Lin, Han
    Jin, Xu
    Wu, Zheng
    An, Hong
    Chi, Mengxian
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2019, 47 (03) : 467 - 480
  • [2] Improving the Performance of Distributed TensorFlow with RDMA
    Chengfan Jia
    Junnan Liu
    Xu Jin
    Han Lin
    Hong An
    Wenting Han
    Zheng Wu
    Mengxian Chi
    International Journal of Parallel Programming, 2018, 46 : 674 - 685
  • [3] Improving the Performance of Distributed TensorFlow with RDMA
    Jia, Chengfan
    Liu, Junnan
    Jin, Xu
    Lin, Han
    An, Hong
    Han, Wenting
    Wu, Zheng
    Chi, Mengxian
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2018, 46 (04) : 674 - 685
  • [4] RM-KVStore: New MXNet KVStore to Accelerate Transfer Performance with RDMA
    Lv, Baocai
    Liu, Bing
    Liu, Fang
    Xiao, Nong
    Chen, Zhiguang
    2018 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2018, : 241 - 247
  • [5] MC-RDMA: Improving Replication Performance of RDMA-based Distributed Systems with Reliable Multicast Support
    Huang, Chengyuan
    Gao, Yixiao
    Chen, Wei
    Li, Duoxing
    Xiao, Yibo
    Zhang, Ruyi
    Tian, Chen
    Wang, Xiaoliang
    Dou, Wanchun
    Chen, Guihai
    Wang, Yi
    Xiao, Fu
    2023 IEEE 31ST INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS, ICNP, 2023,
  • [6] Impact of RDMA Communication on the Performance of Distributed BFS Algorithm
    Guney, Isa Ahmet
    Ovant, Burak Sezin
    Baydere, Sebnem
    2016 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2016), 2016, : 350 - 356
  • [7] DArray: A High Performance RDMA-Based Distributed Array
    Ding, Baorong
    Han, Mingcong
    Chen, Rong
    PROCEEDINGS OF THE 52ND INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2023, 2023, : 715 - 724
  • [8] A Survey of RDMA Distributed Storage
    Wang, Ziqi
    Liu, Yaping
    Zhang, Shuo
    Hu, Jinrui
    Liu, Xinyi
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKS AND INTERNET OF THINGS, CNIOT 2024, 2024, : 534 - 539
  • [9] Protocol Customization for Improving MPI Performance on RDMA-Enabled Clusters
    Gu, Zheng
    Small, Matthew
    Yuan, Xin
    Marathe, Aniruddha
    Lowenthal, David K.
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2013, 41 (05) : 682 - 703
  • [10] Protocol Customization for Improving MPI Performance on RDMA-Enabled Clusters
    Zheng Gu
    Matthew Small
    Xin Yuan
    Aniruddha Marathe
    David K. Lowenthal
    International Journal of Parallel Programming, 2013, 41 : 682 - 703