Improving the Performance of Distributed MXNet with RDMA

被引:0
|
作者
Mingfan Li
Ke Wen
Han Lin
Xu Jin
Zheng Wu
Hong An
Mengxian Chi
机构
[1] University of Science and Technology of China,
关键词
Distributed MXNet; Parameter server; RDMA; InfiniBand; Network optimization;
D O I
暂无
中图分类号
学科分类号
摘要
As one of the most influential deep learning frameworks, MXNet has achieved excellent performance and many breakthroughs in academic and industrial fields for various machine learning situations. The initial implementation of MXNet uses proxy-socket interface, which delivers suboptimal performance in distributed environment. In a massive parallel training task, parameters are updated frequently during each training loop, in which case network performance becomes the main factor of overall performance. Over the past decade, high performance interconnects have employed remote direct memory access (RDMA) technology to provide excellent performance for numerous scientific domains. In this paper, we describe an efficient design that extends the open-source MXNet to make it RDMA capable via RDMA-based parameter server interfaces. With modest optimizations towards memory usage and transmission overhead, RDMA-based MXNet achieves great performance improvement over the original software. Our experiments reveal that, for the communication subsystem of MXNet, the new design achieves 16x speedup (up to 21x at peak) over 1 Gigabit Ethernet (1GigE). For the two training cases on MXNet, the optimized implementation gains 5x and 9x speedup, respectively. Compared to experiments on the IP-over-InfiniBand (IPoIB) protocol, it achieves nearly 30% performance improvement, as well as better scalability and alleviation of bottlenecks.
引用
收藏
页码:467 / 480
页数:13
相关论文
共 50 条
  • [31] RDMA-based Cooperative Caching for a Distributed File System
    Sasaki, Shin
    Matsumiya, Ryo
    Takahashi, Kazushi
    Oyama, Yoshihiro
    Tatebe, Osamu
    2015 IEEE 21ST INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2015, : 344 - 353
  • [32] iRDMA: Efficient Use of RDMA in Distributed Deep Learning Systems
    Ren, Yufei
    Wu, Xingbo
    Zhang, Li
    Wang, Yandong
    Zhang, Wei
    Wang, Zijun
    Hack, Michel
    Jiang, Song
    2017 19TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS (HPCC) / 2017 15TH IEEE INTERNATIONAL CONFERENCE ON SMART CITY (SMARTCITY) / 2017 3RD IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (DSS), 2017, : 231 - 238
  • [33] FastStore: A High-Performance RDMA-enabled Distributed Key-Value Store with Persistent Memory
    Xiong, Ziwei
    Jiang, Dejun
    Xiong, Jin
    2023 IEEE 43RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS, 2023, : 406 - 417
  • [34] Deconstructing RDMA-enabled Distributed Transactions: Hybrid is Better!
    Wei, Xingda
    Dong, Zhiyuan
    Chen, Rong
    Chen, Haibo
    PROCEEDINGS OF THE 13TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, 2018, : 233 - 251
  • [35] RDMA over Ethernet for Distributed AI Training at Meta Scale
    Gangidi, Adithya
    Miao, Rui
    Zheng, Shengbao
    Bondu, Sai Jayesh
    Goes, Guilherme
    Morsy, Hany
    Puri, Rohit
    Riftadi, Mohammad
    Shetty, Ashmitha Jeevaraj
    Yang, Jingyi
    Zhang, Shuqiang
    Fernandez, Mikel Jimenez
    Gandham, Shashidhar
    Zeng, Hongyi
    PROCEEDINGS OF THE 2024 ACM SIGCOMM 2024 CONFERENCE, ACM SIGCOMM 2024, 2024, : 57 - 70
  • [36] Gengar: An RDMA-based Distributed Hybrid Memory Pool
    Duan, Zhuohui
    Liu, Haikun
    Lu, Haodi
    Liao, Xiaofei
    Jin, Hai
    Zhang, Yu
    He, Bingsheng
    2021 IEEE 41ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2021), 2021, : 92 - 103
  • [37] RDMA vs. RPC for Implementing Distributed Data Structures
    Brock, Benjamin
    Chen, Yuxin
    Yan, Jiakun
    Owens, John D.
    Buluc, Aydin
    Yelick, Katherine
    2019 IEEE/ACM 9TH WORKSHOP ON IRREGULAR APPLICATIONS - ARCHITECTURES AND ALGORITHMS (IA3), 2019, : 17 - 22
  • [38] Collie: Finding Performance Anomalies in RDMA Subsystems
    Kong, Xinhao
    Zhu, Yibo
    Zhou, Huaping
    Jiang, Zhuo
    Ye, Jianxi
    Guo, Chuanxiong
    Zhuo, Danyang
    PROCEEDINGS OF THE 19TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION (NSDI '22), 2022, : 287 - 305
  • [39] Understanding RDMA Microarchitecture Resources for Performance Isolation
    Kong, Xinhao
    Chen, Jingrong
    Bai, Wei
    Xu, Yechen
    Elhaddad, Mahmoud
    Raindel, Shachar
    Padhye, Jitendra
    Lebeck, Alvin R.
    Zhuo, Danyang
    PROCEEDINGS OF THE 20TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, NSDI 2023, 2023, : 31 - 48
  • [40] Design Guidelines for High Performance RDMA Systems
    Kalia, Anuj
    Kaminsky, Michael
    Andersen, David G.
    PROCEEDINGS OF USENIX ATC '16: 2016 USENIX ANNUAL TECHNICAL CONFERENCE, 2016, : 437 - 450