High efficient training method of MiniGo on large-scale heterogeneous computing platform

被引:0
|
作者
Li, Rongchun [1 ]
He, Zhouyu [1 ]
Qiao, Peng [1 ]
Jiang, Jingfei [1 ]
Dou, Yong [1 ]
Li, Dongsheng [1 ]
机构
[1] National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha,410073, China
关键词
An efficient multi-level parallel training method suitable for training MiniGo agents on large-scale heterogeneous computing platforms was proposed; including task level parallelism between nodes; CPU-DSP (central processing unit-digital signal process) heterogeneous parallelism and DSP core parallelism. Efficient input/output deployment and eliminated the bottleneck of network communication were realized. A heterogeneous computing memory management oriented to CPU-DSP shared memory structure was proposed to reduce the data handling between heterogeneous devices. Shared memory programming optimization was realized; and the dense convolution calculation operator acceleration optimization was realized by DSP. Results show that compared with 16 core CPU calculation; the maximum acceleration ratio of single core DSP operator acceleration is 16. 44. In this method; the scale of computing nodes is expanded from 1 067 to 4 139; the time required to reach the given termination condition is reduced from 43. 02 h to 16. 05 h; and the expansion efficiency is 69. 1%. Evaluation shows that this method can realize the efficient parallel training of MiniGo on large-scale heterogeneous computing platforms. © 2024 National University of Defense Technology. All rights reserved;
D O I
10.11887/j.cn.202405022
中图分类号
学科分类号
摘要
引用
收藏
页码:209 / 218
相关论文
共 50 条
  • [41] Performance optimization of heterogeneous computing for large-scale dynamic graph data
    Wang, Haifeng
    Guo, Wenkang
    Zhang, Ming
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (01):
  • [42] Efficient large-scale heterogeneous debugging using dynamic tracing
    Nadeau, Didier
    Ezzati-Jivan, Naser
    Dagenais, Michel R.
    JOURNAL OF SYSTEMS ARCHITECTURE, 2019, 98 : 346 - 360
  • [43] AN EFFICIENT MULTISCALE PRECONDITIONER FOR LARGE-SCALE HIGHLY HETEROGENEOUS FLOW
    Fu, Shubin
    Chung, Eric
    Zhao, Lina
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2024, 46 (02): : S352 - S377
  • [44] A Heterogeneous Platform with GPU and FPGA for Power Efficient High Performance Computing
    Wu, Qiang
    Ha, Yajun
    Kumar, Akash
    Luo, Shaobo
    Li, Ang
    Mohamed, Shihab
    2014 14TH INTERNATIONAL SYMPOSIUM ON INTEGRATED CIRCUITS (ISIC), 2014, : 220 - 223
  • [45] Copernicus, a hybrid dataflow and peer-to-peer scientific computing platform for efficient large-scale ensemble sampling
    Pouya, Iman
    Pronk, Sander
    Lundborg, Magnus
    Lindahl, Erik
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 71 : 18 - 31
  • [46] Large-scale informatics platform and high-performance computing at the Feinstein Institute for Medical Research Biorepository
    Lundsten, Robert
    Gregersen, Peter K.
    CELL PRESERVATION TECHNOLOGY, 2006, 4 (03): : 222 - 223
  • [47] Large-scale simulation platform
    Institute of Cybernetics, Tallinn Technical University, Akadeemia tee 21, 12618 Tallinn, Estonia
    WSEAS Trans. Comput., 2007, 1 (65-71):
  • [48] Efficient Interactive Training Selection for Large-Scale Entity Resolution
    Wang, Qing
    Vatsalan, Dinusha
    Christen, Peter
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART II, 2015, 9078 : 562 - 573
  • [49] An efficient implementation of 3D high-resolution imaging for large-scale seismic data with GPU/CPU heterogeneous parallel computing
    Xu, Jincheng
    Liu, Wei
    Wang, Jin
    Liu, Linong
    Zhang, Jianfeng
    COMPUTERS & GEOSCIENCES, 2018, 111 : 272 - 282
  • [50] Computing the Schulze Method for Large-Scale Preference Data Sets
    Csar, Theresa
    Lackner, Martin
    Pichler, Reinhard
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 180 - 187