Rgs-SpMM: Accelerate Sparse Matrix-Matrix Multiplication by Row Group Splitting Strategy on the GPU

被引:0
|
作者
Guo, Mingfeng [1 ]
Wang, Yaobin [1 ]
Huang, Jun [1 ]
Wang, Qingfeng [1 ]
Zhang, Yaqing [1 ]
Xu, Mu [2 ]
Lu, Fang [2 ]
机构
[1] Southwest Univ Sci & Technol, Sch Comp Sci & Technol, Minist Educ, Key Lab Testing Technol Mfg Proc, Mianyang 621010, Sichuan, Peoples R China
[2] Alibaba Grp, Hangzhou, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Sparse Matrix-Matrix Multiplication; GPU; Row group splitting;
D O I
10.1007/978-3-031-21395-3_6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The Sparse Matrix-Matrix Multiplication (SpMM) operation is widely used in different fields, especially the recently popular GNN framework. Researchers have designed many kernels on the GPU to accelerate the SpMM operation. Existing methods mostly adopt a row splitting strategy to obtain better parallelism and memory access efficiency. However, due to irregularities of sparse matrices such as short rows with few non-zero elements, current methods suffer from the under-utilization of thread resources in GPU. In this paper, We rearrange the distribution of non-zero elements in the sparse matrix and design the SpMM kernel based on the row group splitting strategy. In contrast to previous methods which assign a "row" task unit to a warp for processing, we combine short rows in a sparse matrix into "row groups" as a task unit, which allocate more appropriate non-zero elements tasks to the GPU resources. This method reduces the thread divergence in a warp and improves load balancing among warps. Our experimental data comes from the SNAP Matrix Collection. The results show that our kernel is faster than cuSPARSE and GE-SpMM, with an average speedup of 1.61 and 1.42 respectively.
引用
收藏
页码:61 / 66
页数:6
相关论文
共 50 条
  • [1] HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs with Hybrid GPU Cores
    Li, Zhonggen
    Ke, Xiangyu
    Zhu, Yifan
    Gao, Yunjun
    Tu, Yaofeng
    arXiv,
  • [2] Optimizing Sparse Matrix-Matrix Multiplication for the GPU
    Dalton, Steven
    Olson, Luke
    Bell, Nathan
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2015, 41 (04):
  • [3] Adaptive Sparse Matrix-Matrix Multiplication on the GPU
    Winter, Martin
    Mlakar, Daniel
    Zayer, Rhaleb
    Seidel, Hans-Peter
    Steinberger, Markus
    PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), 2019, : 68 - 81
  • [4] GPU-ACCELERATED SPARSE MATRIX-MATRIX MULTIPLICATION BY ITERATIVE ROW MERGING
    Gremse, Felix
    Hoefter, Andreas
    Schwen, Lars Ole
    Kiessling, Fabian
    Naumann, Uwe
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (01): : C54 - C71
  • [5] Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores
    Zhao, Haisha
    Li, San
    Wang, Jiaheng
    Zhou, Chunbao
    Wang, Jue
    Xin, Zhikuang
    Li, Shunde
    Liang, Zhiqiang
    Pan, Zhijie
    Liu, Fang
    Zeng, Yan
    Wang, Yangang
    Chi, Xuebin
    PROCEEDINGS OF THE 2025 THE 30TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2025, 2025, : 326 - 338
  • [6] Accelerating sparse matrix-matrix multiplication with GPU Tensor Cores
    Zachariadis, Orestis
    Satpute, Nitin
    Gomez-Luna, Juan
    Olivares, Joaquin
    COMPUTERS & ELECTRICAL ENGINEERING, 2020, 88 (88)
  • [7] SPMSD: An Partitioning-Strategy for Parallel General Sparse Matrix-Matrix Multiplication on GPU
    Cui, Huanyu
    Wang, Nianbin
    Han, Qilong
    Wang, Ye
    PARALLEL PROCESSING LETTERS, 2024, 34 (02)
  • [8] An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data
    Liu, Weifeng
    Vinter, Brian
    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [9] Learning Sparse Matrix Row Permutations for Efficient SpMM on GPU Architectures
    Mehrabi, Atefeh
    Lee, Donghyuk
    Chatterjee, Niladrish
    Sorin, Daniel J.
    Lee, Benjamin C.
    O'Connor, Mike
    2021 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS 2021), 2021, : 48 - 58
  • [10] Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures
    Deveci, Mehmet
    Trott, Christian
    Rajamanickam, Sivasankaran
    PARALLEL COMPUTING, 2018, 78 : 33 - 46