Performance Optimization Techniques of Irregular-Shaped Matrix Multiplication on SW26010P

被引:0
|
作者
Hu, Yi [1 ,2 ,3 ]
Chen, Daokun [1 ,2 ,3 ]
Yang, Chao [1 ,2 ]
机构
[1] School of Mathematical Sciences, Peking University, Beijing,100871, China
[2] Research Center of Advanced Computing, Changsha Institute for Computing and Digital Economy, Peking University, Changsha,410205, China
[3] Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing,100190, China
关键词
Computational fluid dynamics - Digital storage - Hydraulics - Matrix algebra;
D O I
10.3778/j.issn.1002-8331.2405-0142
中图分类号
学科分类号
摘要
Matrix multiplication is widely used in the field of scientific and engineering computing, and is the most important optimization object in BLAS. With the development of artificial neural networks, computational fluid mechanics and other fields, irregular-shaped matrix multiplication is rapidly gaining attention. This paper proposes parallelization techniques for irregular-shaped matrix multiplication on SW26010P, a domestic many-core processor deployed in the new generation Sunway supercomputer. Specifically, a parallel algorithm with diversified task partition mapping is designed to improve memory access bandwidth utilization rate based on the hardware characteristics and the data layout of matrix elements. At the same time, based on the hardware assembly lines and vectorized computation and data access instructions, the key computations are abstracted and the corresponding underlying compilation optimizations are performed to improve computational efficiency. And a data-sharing strategy under the RMA point to point communication mechanism is adopted to further reduce the overhead of data access and transmission, and the nested double buffering are used to further improve the performance. Besides, a series of experiments on SW26010P are conducted to determine the optimal number of blocks of different kinds of function parallelization calculation for the purpose of making full use of the hardware platform performance. The experimental results demonstrate that the performance of the irregular-shaped matrix multiplication optimized in this thesis can reach up to 93% of the upper bound of the theoretical performance. Compared with the massive GEMM algorithm implementation, the average performance acceleration of the irregular-shaped matrix multiplication is 5.43 times, and the optimal performance acceleration can reach up to 51.5 times. © 2025 Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press. All rights reserved.
引用
收藏
页码:150 / 163
相关论文
共 39 条
  • [1] Detailed Analysis and Optimization of Irregular-Shaped Matrix Multiplication on Multi-Core DSPs
    Mo, Haotian
    Wang, Qinglin
    Liao, Linyu
    Li, Biao
    Chi, Lihua
    Liu, Jie
    53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024, 2024, : 1176 - 1186
  • [2] Optimizing Irregular-Shaped Matrix-Matrix Multiplication on Multi-Core DSPs
    Yin, Shangfei
    Wang, Qinglin
    Hao, Ruochen
    Zhou, Tianyang
    Mei, Songzhu
    Liu, Jie
    2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 451 - 461
  • [3] Implementation and optimization of SpMV algorithm based on SW26010P many-core processor and stored in BCSR format
    Ma, Mengfei
    Huang, Xianqing
    Xu, Jiali
    Jia, Dongning
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [4] Runtime Adaptive Matrix Multiplication for the SW26010 Many-Core Processor
    Wu, Zheng
    Li, Mingfan
    Chi, Mengxian
    Xu, Le
    An, Hong
    IEEE ACCESS, 2020, 8 : 156915 - 156928
  • [5] 面向SW26010P的异形矩阵乘法众核并行优化技术研究
    胡怡
    陈道琨
    杨超
    计算机工程与应用, 2025, 61 (06) : 150 - 163
  • [6] ROBUST OPTIMIZATION OF DESCENT TRAJECTORIES ON IRREGULAR-SHAPED BODIES IN THE PRESENCE OF UNCERTAINTY
    Machuca, Pablo
    Gonzalez-Arribas, Daniel
    Morante-Gonzalez, David
    Sanjurjo-Rivo, Manuel
    Soler, Manuel
    ASTRODYNAMICS 2017, PTS I-IV, 2018, 162 : 1463 - 1475
  • [7] Optimization techniques for small matrix multiplication
    Drevet, Charles-Eric
    Islam, Md Nazrul
    Schost, Eric
    THEORETICAL COMPUTER SCIENCE, 2011, 412 (22) : 2219 - 2236
  • [8] Particle Swarm Optimization of Irregular-shaped Hexagon Patch Antenna for 2.4 GHz WLAN Applications
    Weng, Wei-Chung
    Chang, Min-Chi
    APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY JOURNAL, 2021, 36 (12): : 1535 - 1540
  • [9] Cache performance optimization of irregular sparse matrix multiplication on modern multi-core CPU and GPU
    刘力
    LiuLi
    Yang Guang wen
    HighTechnologyLetters, 2013, 19 (04) : 339 - 345
  • [10] Irregular-Shaped Torsion Spring Design for Gravity Compensation Mechanism Using Chain Algorithm and DIRECT Optimization
    Shan, Zexin
    Endo, Mitsuru
    Nakamura, Hiroshi
    Tsutsui, Yukio
    ADVANCES IN ITALIAN MECHANISM SCIENCE, VOL 1, IFIT 2024, 2024, 163 : 152 - 161