High efficient training method of MiniGo on large-scale heterogeneous computing platform

被引:0
|
作者
Li, Rongchun [1 ]
He, Zhouyu [1 ]
Qiao, Peng [1 ]
Jiang, Jingfei [1 ]
Dou, Yong [1 ]
Li, Dongsheng [1 ]
机构
[1] National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha,410073, China
关键词
An efficient multi-level parallel training method suitable for training MiniGo agents on large-scale heterogeneous computing platforms was proposed; including task level parallelism between nodes; CPU-DSP (central processing unit-digital signal process) heterogeneous parallelism and DSP core parallelism. Efficient input/output deployment and eliminated the bottleneck of network communication were realized. A heterogeneous computing memory management oriented to CPU-DSP shared memory structure was proposed to reduce the data handling between heterogeneous devices. Shared memory programming optimization was realized; and the dense convolution calculation operator acceleration optimization was realized by DSP. Results show that compared with 16 core CPU calculation; the maximum acceleration ratio of single core DSP operator acceleration is 16. 44. In this method; the scale of computing nodes is expanded from 1 067 to 4 139; the time required to reach the given termination condition is reduced from 43. 02 h to 16. 05 h; and the expansion efficiency is 69. 1%. Evaluation shows that this method can realize the efficient parallel training of MiniGo on large-scale heterogeneous computing platforms. © 2024 National University of Defense Technology. All rights reserved;
D O I
10.11887/j.cn.202405022
中图分类号
学科分类号
摘要
引用
收藏
页码:209 / 218
相关论文
共 50 条
  • [21] Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems
    Xiao, Shucai
    Feng, Wu-chun
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2554 - 2557
  • [22] eSplash: Efficient Speculation in Large Scale Heterogeneous Computing Systems
    Wang, Jiayin
    Wang, Teng
    Yang, Zhengyu
    Mi, Ningfang
    Sheng, Bo
    2016 IEEE 35TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2016,
  • [23] A large scale distributed platform for high performance computing
    Abdennadher, N
    Boesch, R
    GRID AND COOPERATIVE COMPUTING - GCC 2005, PROCEEDINGS, 2005, 3795 : 848 - 859
  • [24] A large scale distributed platform for high performance computing
    Abdennadher, N
    Boesch, R
    8TH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND NETWORKS, PROCEEDINGS, 2005, : 414 - 419
  • [25] An Efficient Method for Computing Exact Delay-Margins of Large-Scale Power Systems
    Li, Chongtao
    Duan, Chao
    Cao, Yulei
    IEEE TRANSACTIONS ON POWER SYSTEMS, 2020, 35 (06) : 4924 - 4927
  • [26] Very Large-Scale and Node-Heavy Graph Analytics with Heterogeneous FPGA plus CPU Computing Platform
    Zou, Yu
    Lin, Mingjie
    2018 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI), 2018, : 638 - 643
  • [27] On Efficient Training of Large-Scale Deep Learning Models
    Shen, Li
    Sun, Yan
    Yu, Zhiyuan
    Ding, Liang
    Tian, Xinmei
    Tao, Dacheng
    ACM COMPUTING SURVEYS, 2025, 57 (03)
  • [28] High-Speed Circulation Flow Platform Facilitating Practical Large-Scale Heterogeneous Photocatalysis
    Liu, Chenguang
    Song, Lei
    Liu, Qiong
    Chen, Weihao
    Xu, Jinhui
    Wang, Mu
    Zhang, Yanbin
    Tan, Ting Wei
    Lei, Zhexuan
    Cheng, Lei
    Khan, Saif A.
    Wu, Jie
    ORGANIC PROCESS RESEARCH & DEVELOPMENT, 2024, 28 (05) : 1964 - 1970
  • [29] Large-scale neural network method for brain computing
    Miyakawa, N
    Ichikawa, M
    Matsumoto, G
    APPLIED MATHEMATICS AND COMPUTATION, 2000, 111 (2-3) : 203 - 208
  • [30] An efficient method for large-scale slack allocation
    Joshi, Siddharth
    Boyd, Stephen
    ENGINEERING OPTIMIZATION, 2009, 41 (12) : 1163 - 1176