Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations

被引:0
|
作者
Haonan Ji
Shibo Lu
Kaixi Hou
Hao Wang
Zhou Jin
Weifeng Liu
Brian Vinter
机构
[1] China University of Petroleum-Beijing,Super Scientific Software Laboratory, Department of Computer Science and Technology
[2] Virginia Tech,Department of Computer Science
[3] The Ohio State University,Department of Computer Science and Engineering
[4] Aarhus University,Faculty of Technical Sciences
关键词
Parallel computing; Segmented merge; Sparse matrix; GPU;
D O I
暂无
中图分类号
学科分类号
摘要
Segmented operations, such as segmented sum, segmented scan and segmented sort, are important building blocks for parallel irregular algorithms. We in this work propose a new parallel primitive called segmented merge. Its function is in parallel merging q sub-segments to p segments, both of possibly nonuniform lengths which easily cause the load balancing and the vectorization problems on massively parallel processors, such as GPUs. Our algorithm resolves these problems by first recording the boundaries of segments and sub-segments, then assigning roughly the same number of elements for GPU threads, and finally iteratively merging the sub-segments in each segment in the form of binary tree until there is only one sub-segment in each segment. We implement the segmented merge primitive on GPUs and demonstrate its efficiency on parallel sparse matrix transposition (SpTRANS) and sparse matrix–matrix multiplication (SpGEMM) operations. We conduct a comparative experiment with NVIDIA vendor library on two GPUs. The experimental results show that our algorithm achieve on average 3.94× (up to 13.09×) and 2.89× (up to 109.15×) speedup on SpTRANS and SpGEMM, respectively.
引用
收藏
页码:732 / 744
页数:12
相关论文
共 50 条
  • [41] The SPARAMAT approach to automatic comprehension of sparse matrix computations
    Kessler, CW
    Smith, CH
    SEVENTH INTERNATIONAL WORKSHOP ON PROGRAM COMPREHENSION, PROCEEDINGS, 1999, : 200 - 207
  • [42] SPARSE-MATRIX COMPUTATIONS ON THE HYPERCUBE AND RELATED NETWORKS
    MANZINI, G
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1994, 21 (02) : 169 - 183
  • [43] Sparse matrix computations arising in distributed parameter identification
    Vogel, CR
    SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1999, 20 (04) : 1027 - 1037
  • [44] Parallel matrix computations in air pollution modelling
    Owczarz, W
    Zlatev, Z
    PARALLEL COMPUTING, 2002, 28 (02) : 355 - 368
  • [45] Parallel methods for matrix computations obtained in FEM
    Wencel, Jaroslaw
    Smykowski, Jakub
    Katarzynski, Piotr
    Szymanski, Grzegorz
    PRZEGLAD ELEKTROTECHNICZNY, 2007, 83 (11): : 32 - 34
  • [46] Parallel matrix computations and their applications for biomagnetic fields
    Zerbe, V
    Keller, H
    Schorcht, G
    ADVANCES IN PARALLEL AND DISTRIBUTED COMPUTING - PROCEEDINGS, 1997, : 139 - 142
  • [47] New Parallel Algorithms for Direct Solution of Large Sparse Matrix Equations
    Liu, Chao
    Yang, Hao
    Wang, Dong
    Wu, Tao
    Wu, Xi
    Luo, ZhiRong
    Tang, XuDong
    2018 10TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA), 2018, : 354 - 357
  • [48] New Efficient General Sparse Matrix Formats for Parallel SpMV Operations
    Ecker, Jan Philipp
    Berrendorf, Rudolf
    Mannuss, Florian
    EURO-PAR 2017: PARALLEL PROCESSING, 2017, 10417 : 523 - 537
  • [49] Partitioning sparse rectangular matrices for parallel computations of Ax and ATv
    Hendrickson, B
    Kolda, TG
    APPLIED PARALLEL COMPUTING: LARGE SCALE SCIENTIFIC AND INDUSTRIAL PROBLEMS, 1998, 1541 : 239 - 247
  • [50] A direct parallel sparse matrix solver
    Tran, TM
    Gruber, R
    Appert, K
    Wuthrich, S
    COMPUTER PHYSICS COMMUNICATIONS, 1996, 96 (2-3) : 118 - 128