Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations

被引:0
|
作者
Haonan Ji
Shibo Lu
Kaixi Hou
Hao Wang
Zhou Jin
Weifeng Liu
Brian Vinter
机构
[1] China University of Petroleum-Beijing,Super Scientific Software Laboratory, Department of Computer Science and Technology
[2] Virginia Tech,Department of Computer Science
[3] The Ohio State University,Department of Computer Science and Engineering
[4] Aarhus University,Faculty of Technical Sciences
关键词
Parallel computing; Segmented merge; Sparse matrix; GPU;
D O I
暂无
中图分类号
学科分类号
摘要
Segmented operations, such as segmented sum, segmented scan and segmented sort, are important building blocks for parallel irregular algorithms. We in this work propose a new parallel primitive called segmented merge. Its function is in parallel merging q sub-segments to p segments, both of possibly nonuniform lengths which easily cause the load balancing and the vectorization problems on massively parallel processors, such as GPUs. Our algorithm resolves these problems by first recording the boundaries of segments and sub-segments, then assigning roughly the same number of elements for GPU threads, and finally iteratively merging the sub-segments in each segment in the form of binary tree until there is only one sub-segment in each segment. We implement the segmented merge primitive on GPUs and demonstrate its efficiency on parallel sparse matrix transposition (SpTRANS) and sparse matrix–matrix multiplication (SpGEMM) operations. We conduct a comparative experiment with NVIDIA vendor library on two GPUs. The experimental results show that our algorithm achieve on average 3.94× (up to 13.09×) and 2.89× (up to 109.15×) speedup on SpTRANS and SpGEMM, respectively.
引用
收藏
页码:732 / 744
页数:12
相关论文
共 50 条
  • [21] FAST PARALLEL MATRIX AND GCD COMPUTATIONS
    BORODIN, A
    GATHEN, JV
    HOPCROFT, J
    INFORMATION AND CONTROL, 1982, 52 (03): : 241 - 256
  • [22] An efficient parallel architecture for matrix computations
    Pedram, Ardavan
    Daneshtalab, Masoud
    Fakhraie, Sied Mehdi
    24TH NORCHIP CONFERENCE, PROCEEDINGS, 2006, : 171 - +
  • [23] Toward an automatic parallelization of sparse matrix computations
    Adle, R
    Aiguier, M
    Delaplace, F
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2005, 65 (03) : 313 - 330
  • [24] PREDICTING STRUCTURE IN SPARSE-MATRIX COMPUTATIONS
    GILBERT, JR
    SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1994, 15 (01) : 62 - 79
  • [25] Sparse matrix computations on manycore GPU's
    Garland, Michael
    2008 45TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2, 2008, : 2 - 6
  • [26] Sparse matrix computations for dynamic network centrality
    Arrigo F.
    Higham D.J.
    Arrigo, Francesca (francesca.arrigo@strath.ac.uk), 2017, Springer Science and Business Media Deutschland GmbH (02)
  • [27] ON FINDING SUPERNODES FOR SPARSE-MATRIX COMPUTATIONS
    LIU, JWH
    NG, EG
    PEYTON, BW
    SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1993, 14 (01) : 242 - 252
  • [28] Modelling the cache performance of sparse matrix computations
    Rauber, T
    Scholtes, C
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 2271 - 2277
  • [29] Recursive algorithms of parallel matrix computations
    Sukhov, EG
    AUTOMATION AND REMOTE CONTROL, 2001, 62 (11) : 1924 - 1929
  • [30] Parallel algorithms for certain matrix computations
    Codenotti, B
    Datta, BN
    Datta, K
    Leoncini, M
    THEORETICAL COMPUTER SCIENCE, 1997, 180 (1-2) : 287 - 308