High efficient training method of MiniGo on large-scale heterogeneous computing platform

被引：0

作者：

Li, Rongchun ^{[1
]}

He, Zhouyu ^{[1
]}

Qiao, Peng ^{[1
]}

Jiang, Jingfei ^{[1
]}

Dou, Yong ^{[1
]}

Li, Dongsheng ^{[1
]}

机构：

[1] National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha,410073, China

来源：

Guofang Keji Daxue Xuebao/Journal of National University of Defense Technology | 2024年 / 46卷 / 05期

关键词：

An efficient multi-level parallel training method suitable for training MiniGo agents on large-scale heterogeneous computing platforms was proposed; including task level parallelism between nodes; CPU-DSP (central processing unit-digital signal process) heterogeneous parallelism and DSP core parallelism. Efficient input/output deployment and eliminated the bottleneck of network communication were realized. A heterogeneous computing memory management oriented to CPU-DSP shared memory structure was proposed to reduce the data handling between heterogeneous devices. Shared memory programming optimization was realized; and the dense convolution calculation operator acceleration optimization was realized by DSP. Results show that compared with 16 core CPU calculation; the maximum acceleration ratio of single core DSP operator acceleration is 16. 44. In this method; the scale of computing nodes is expanded from 1 067 to 4 139; the time required to reach the given termination condition is reduced from 43. 02 h to 16. 05 h; and the expansion efficiency is 69. 1%. Evaluation shows that this method can realize the efficient parallel training of MiniGo on large-scale heterogeneous computing platforms. © 2024 National University of Defense Technology. All rights reserved;

D O I：

10.11887/j.cn.202405022

中图分类号：

学科分类号：

摘要：

引用

页码：209 / 218

共 50 条

[21] Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems
Xiao, Shucai
Feng, Wu-chun
2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2554 - 2557
[22] eSplash: Efficient Speculation in Large Scale Heterogeneous Computing Systems
Wang, Jiayin
Wang, Teng
Yang, Zhengyu
Mi, Ningfang
Sheng, Bo
2016 IEEE 35TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2016,
[23] A large scale distributed platform for high performance computing
Abdennadher, N
Boesch, R
GRID AND COOPERATIVE COMPUTING - GCC 2005, PROCEEDINGS, 2005, 3795 : 848 - 859
[24] A large scale distributed platform for high performance computing
Abdennadher, N
Boesch, R
8TH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND NETWORKS, PROCEEDINGS, 2005, : 414 - 419
[25] An Efficient Method for Computing Exact Delay-Margins of Large-Scale Power Systems
Li, Chongtao
Duan, Chao
Cao, Yulei
IEEE TRANSACTIONS ON POWER SYSTEMS, 2020, 35 (06) : 4924 - 4927
[26] Very Large-Scale and Node-Heavy Graph Analytics with Heterogeneous FPGA plus CPU Computing Platform
Zou, Yu
Lin, Mingjie
2018 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI), 2018, : 638 - 643
[27] On Efficient Training of Large-Scale Deep Learning Models
Shen, Li
Sun, Yan
Yu, Zhiyuan
Ding, Liang
Tian, Xinmei
Tao, Dacheng
ACM COMPUTING SURVEYS, 2025, 57 (03)
[28] High-Speed Circulation Flow Platform Facilitating Practical Large-Scale Heterogeneous Photocatalysis
Liu, Chenguang
Song, Lei
Liu, Qiong
Chen, Weihao
Xu, Jinhui
Wang, Mu
Zhang, Yanbin
Tan, Ting Wei
Lei, Zhexuan
Cheng, Lei
Khan, Saif A.
Wu, Jie
ORGANIC PROCESS RESEARCH & DEVELOPMENT, 2024, 28 (05) : 1964 - 1970
[29] Large-scale neural network method for brain computing
Miyakawa, N
Ichikawa, M
Matsumoto, G
APPLIED MATHEMATICS AND COMPUTATION, 2000, 111 (2-3) : 203 - 208
[30] An efficient method for large-scale slack allocation
Joshi, Siddharth
Boyd, Stephen
ENGINEERING OPTIMIZATION, 2009, 41 (12) : 1163 - 1176

← 1 2 3 4 5 →