Optimization of GPU-based Sparse Matrix Multiplication for Large Sparse Networks

被引:11
|
作者
Lee, Jeongmyung [1 ]
Kang, Seokwon [1 ]
Yu, Yongseung [1 ]
Jo, Yong-Yeon [1 ]
Kim, Sang-Wook [1 ]
Park, Yongjun [1 ]
机构
[1] Hanyang Univ, Dept Comp Sci, Seoul, South Korea
关键词
Sparse matrix multiplication; sparse network; GPU; linear algebra; EFFICIENT;
D O I
10.1109/ICDE48307.2020.00085
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse matrix multiplication (spGEMM) is widely used to analyze the sparse network data, and extract important information based on matrix representation. As it contains a high degree of data parallelism, many efficient implementations using data-parallel programming platforms such as CUDA and OpenCL have been introduced on graphic processing units (GPUs). Several well-known spGEMM techniques, such as cuSPARSE and CUSP, often do not utilize the GPU resources fully, owing to the load imbalance between threads in the expansion process and high memory contention in the merge process. Furthermore, even though several outer-product-based spGEMM techniques are proposed to solve the load balancing problem on expansion, they still do not utilize the GPU resources fully, because severe computation load variations exist among the multiple thread blocks. To solve these challenges, this paper proposes a new optimization pass called Block Reorganizer, which balances the total computations of each computing unit on target GPUs, based on the outer-product-based expansion process, and reduces the memory pressure during the merge process. For expansion, it first identifies the actual computation amount for each block, and then performs two thread block transformation processes based on their characteristics: 1) B-Splitting to transform a heavy-computation blocks into multiple small blocks and 2) B Gathering to aggregate multiple small-computation blocks to a larger block. While merging, it improves the overall performance by performing B-Limiting to limit the number of blocks on each computing unit. Experimental results show that it improves the total performance of kernel execution by 1.43x, on an average, when compared to the row-product-based spGEMM, for NVIDIA Titan Xp GPUs on real-world datasets.
引用
收藏
页码:925 / 936
页数:12
相关论文
共 50 条
  • [1] Fast Sparse Matrix and Sparse Vector Multiplication Algorithm on the GPU
    Yang, Carl
    Wang, Yangzihao
    Owens, John D.
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 841 - 847
  • [2] An Efficient Sparse Matrix Multiplication for skewed matrix on GPU
    Shah, Monika
    Patel, Vibha
    2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 1301 - 1306
  • [3] Adaptive Sparse Matrix-Matrix Multiplication on the GPU
    Winter, Martin
    Mlakar, Daniel
    Zayer, Rhaleb
    Seidel, Hans-Peter
    Steinberger, Markus
    PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), 2019, : 68 - 81
  • [4] Design Principles for Sparse Matrix Multiplication on the GPU
    Yang, Carl
    Buluc, Aydin
    Owens, John D.
    EURO-PAR 2018: PARALLEL PROCESSING, 2018, 11014 : 672 - 687
  • [5] A GPU Framework for Sparse Matrix Vector Multiplication
    Neelima, B.
    Reddy, G. Ram Mohana
    Raghavendra, Prakash S.
    2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2014, : 51 - 58
  • [6] Optimizing Sparse Matrix-Matrix Multiplication for the GPU
    Dalton, Steven
    Olson, Luke
    Bell, Nathan
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2015, 41 (04):
  • [7] GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication
    Tao, Yuan
    Deng, Yangdong
    Mu, Shuai
    Zhang, Zhenzhong
    Zhu, Mingfa
    Xiao, Limin
    Ruan, Li
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (14): : 3771 - 3789
  • [8] A GPU-based Associative Memory using Sparse Neural Networks
    Yao, Zhe
    Gripon, Vincent
    Rabbat, Michael
    2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2014, : 688 - 692
  • [9] Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU Systems
    Gao, Jianhua
    Ji, Weixing
    Wang, Yizhuo
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (04)
  • [10] Sparse Matrix Assembly on the GPU Through Multiplication Patterns
    Zayer, Rhaleb
    Steinberger, Markus
    Seidel, Hans-Peter
    2017 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2017,