Optimization of GPU-based Sparse Matrix Multiplication for Large Sparse Networks

被引:11
|
作者
Lee, Jeongmyung [1 ]
Kang, Seokwon [1 ]
Yu, Yongseung [1 ]
Jo, Yong-Yeon [1 ]
Kim, Sang-Wook [1 ]
Park, Yongjun [1 ]
机构
[1] Hanyang Univ, Dept Comp Sci, Seoul, South Korea
关键词
Sparse matrix multiplication; sparse network; GPU; linear algebra; EFFICIENT;
D O I
10.1109/ICDE48307.2020.00085
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse matrix multiplication (spGEMM) is widely used to analyze the sparse network data, and extract important information based on matrix representation. As it contains a high degree of data parallelism, many efficient implementations using data-parallel programming platforms such as CUDA and OpenCL have been introduced on graphic processing units (GPUs). Several well-known spGEMM techniques, such as cuSPARSE and CUSP, often do not utilize the GPU resources fully, owing to the load imbalance between threads in the expansion process and high memory contention in the merge process. Furthermore, even though several outer-product-based spGEMM techniques are proposed to solve the load balancing problem on expansion, they still do not utilize the GPU resources fully, because severe computation load variations exist among the multiple thread blocks. To solve these challenges, this paper proposes a new optimization pass called Block Reorganizer, which balances the total computations of each computing unit on target GPUs, based on the outer-product-based expansion process, and reduces the memory pressure during the merge process. For expansion, it first identifies the actual computation amount for each block, and then performs two thread block transformation processes based on their characteristics: 1) B-Splitting to transform a heavy-computation blocks into multiple small blocks and 2) B Gathering to aggregate multiple small-computation blocks to a larger block. While merging, it improves the overall performance by performing B-Limiting to limit the number of blocks on each computing unit. Experimental results show that it improves the total performance of kernel execution by 1.43x, on an average, when compared to the row-product-based spGEMM, for NVIDIA Titan Xp GPUs on real-world datasets.
引用
收藏
页码:925 / 936
页数:12
相关论文
共 50 条
  • [21] A New Method of Sparse Matrix-Vector Multiplication on GPU
    Huan, Gao
    Qian, Zhang
    PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 954 - 958
  • [22] Sparse matrix by vector multiplication on transputer networks
    Doreste, L.
    Navarro, J.J.
    Fernandez, A.
    Proceedings of the IASTED International Symposium on Applied Informatics, 1991,
  • [23] GPU-based parallel algorithms for sparse nonlinear systems
    Galiano, V.
    Migallon, H.
    Migallon, V.
    Penades, J.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2012, 72 (09) : 1098 - 1105
  • [24] Sparse matrix multiplication
    Briggs, P
    ACM SIGPLAN NOTICES, 1996, 31 (11) : 33 - 37
  • [25] Multi-GPU Implementation and Performance Optimization for CSR-Based Sparse Matrix-Vector Multiplication
    Guo, Ping
    Zhang, Changjiang
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2419 - 2423
  • [26] A Novel Multi-GPU Parallel Optimization Model for The Sparse Matrix-Vector Multiplication
    Gao, Jiaquan
    Zhou, Yuanshen
    Wu, Kesong
    PARALLEL PROCESSING LETTERS, 2016, 26 (04)
  • [27] An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data
    Liu, Weifeng
    Vinter, Brian
    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [28] Adaptive Sparse Tiling for Sparse Matrix Multiplication
    Hong, Changwan
    Sukumaran-Rajam, Aravind
    Nisa, Israt
    Singh, Kunal
    Sadayappan, P.
    PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), 2019, : 300 - 314
  • [29] Block strategy and adaptive storage for sparse matrix–vector multiplication on GPU
    Zhixiang Zhao
    Yanxia Wu
    Guoyin Zhang
    Yiqing Yang
    Haibo Liu
    Cluster Computing, 2025, 28 (5)
  • [30] GPU Algorithms for Structured Sparse Matrix Multiplication with Diagonal Storage Schemes
    Haque, Sardar Anisul
    Parvez, Mohammad Tanvir
    Hossain, Shahadat
    ALGORITHMS, 2024, 17 (01)