Improving the Performance of Distributed MXNet with RDMA

被引:0
|
作者
Mingfan Li
Ke Wen
Han Lin
Xu Jin
Zheng Wu
Hong An
Mengxian Chi
机构
[1] University of Science and Technology of China,
关键词
Distributed MXNet; Parameter server; RDMA; InfiniBand; Network optimization;
D O I
暂无
中图分类号
学科分类号
摘要
As one of the most influential deep learning frameworks, MXNet has achieved excellent performance and many breakthroughs in academic and industrial fields for various machine learning situations. The initial implementation of MXNet uses proxy-socket interface, which delivers suboptimal performance in distributed environment. In a massive parallel training task, parameters are updated frequently during each training loop, in which case network performance becomes the main factor of overall performance. Over the past decade, high performance interconnects have employed remote direct memory access (RDMA) technology to provide excellent performance for numerous scientific domains. In this paper, we describe an efficient design that extends the open-source MXNet to make it RDMA capable via RDMA-based parameter server interfaces. With modest optimizations towards memory usage and transmission overhead, RDMA-based MXNet achieves great performance improvement over the original software. Our experiments reveal that, for the communication subsystem of MXNet, the new design achieves 16x speedup (up to 21x at peak) over 1 Gigabit Ethernet (1GigE). For the two training cases on MXNet, the optimized implementation gains 5x and 9x speedup, respectively. Compared to experiments on the IP-over-InfiniBand (IPoIB) protocol, it achieves nearly 30% performance improvement, as well as better scalability and alleviation of bottlenecks.
引用
收藏
页码:467 / 480
页数:13
相关论文
共 50 条
  • [41] A Performance Study to Guide RDMA Programming Decisions
    MacArthur, Patrick
    Russell, Robert D.
    2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 778 - 785
  • [42] Enhancing the performance of NFSv4 with RDMA
    Noronha, Ranjit
    Chai, Lei
    Shepler, Spencer
    Pandal, Dhabaleswar K.
    SNAPI 2007: FOURTH INTERNATIONAL WORKSHOP ON STORAGE NETWORK ARCHITECTURE AND PARALLEL I/OS, PROCEEDINGS, 2007, : 90 - +
  • [43] Designing NFS with RDMA for Security, Performance and Scalability
    Noronha, Ranjit
    Chai, Lei
    Talpey, Thomas
    Panda, Dhabaleswar K.
    2007 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPP), 2007, : 408 - 415
  • [44] Improving Average Performance by Relaxing Distributed Data Structures
    Talmage, Edward
    Welch, Jennifer L.
    DISTRIBUTED COMPUTING (DISC 2014), 2014, 8784 : 421 - 438
  • [45] Roadmap for Improving Volunteer Distributed Computing Project Performance
    Yakimets, Vladimir
    Kurochkin, Ilya
    SUPERCOMPUTING (RUSCDAYS 2019), 2019, 1129 : 690 - 700
  • [46] Improving distributed OS performance by flexible incremental linking
    Pizka, M
    PDPTA'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-4, 2003, : 1579 - 1585
  • [47] Improving Spatial Modulation performance in Distributed Antenna Systems
    Abbasfar, Aliazam
    Boroujerdi, Mahdi Nouri
    2016 24TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2016, : 436 - 441
  • [48] Improving the performance of software distributed shared memory with speculation
    Kistler, M
    Alvisi, L
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2005, 16 (09) : 885 - 896
  • [49] Improving the Performance of Read Operations in Distributed File System
    Krishna, T. Lakshmi Siva Rama
    Ragunathan, T.
    Battula, Sudheer Kumar
    2014 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS, 2014, : 1126 - 1130
  • [50] Improving Performance of Multimedia Applications in Distributed Collaborative Environment
    Sabrina, Fariza
    2010 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE GLOBECOM 2010, 2010,