Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations

被引:0
|
作者
Haonan Ji
Shibo Lu
Kaixi Hou
Hao Wang
Zhou Jin
Weifeng Liu
Brian Vinter
机构
[1] China University of Petroleum-Beijing,Super Scientific Software Laboratory, Department of Computer Science and Technology
[2] Virginia Tech,Department of Computer Science
[3] The Ohio State University,Department of Computer Science and Engineering
[4] Aarhus University,Faculty of Technical Sciences
关键词
Parallel computing; Segmented merge; Sparse matrix; GPU;
D O I
暂无
中图分类号
学科分类号
摘要
Segmented operations, such as segmented sum, segmented scan and segmented sort, are important building blocks for parallel irregular algorithms. We in this work propose a new parallel primitive called segmented merge. Its function is in parallel merging q sub-segments to p segments, both of possibly nonuniform lengths which easily cause the load balancing and the vectorization problems on massively parallel processors, such as GPUs. Our algorithm resolves these problems by first recording the boundaries of segments and sub-segments, then assigning roughly the same number of elements for GPU threads, and finally iteratively merging the sub-segments in each segment in the form of binary tree until there is only one sub-segment in each segment. We implement the segmented merge primitive on GPUs and demonstrate its efficiency on parallel sparse matrix transposition (SpTRANS) and sparse matrix–matrix multiplication (SpGEMM) operations. We conduct a comparative experiment with NVIDIA vendor library on two GPUs. The experimental results show that our algorithm achieve on average 3.94× (up to 13.09×) and 2.89× (up to 109.15×) speedup on SpTRANS and SpGEMM, respectively.
引用
收藏
页码:732 / 744
页数:12
相关论文
共 50 条
  • [31] Recurrent algorithms of parallel matrix computations
    Sukhov, E.G.
    Avtomatika i Telemekhanika, 2001, (11): : 183 - 189
  • [32] Recursive Algorithms of Parallel Matrix Computations
    E. G. Sukhov
    Automation and Remote Control, 2001, 62 : 1924 - 1929
  • [33] Optimizing Sparse Matrix Operations on GPUs using Merge Path
    Dalton, Steven
    Olson, Luke
    Baxter, Sean
    Merrill, Duane
    Garland, Michael
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 407 - 416
  • [34] Automatic parallelization of sparse matrix computations: A static analysis
    Adle, R
    Aiguier, M
    Delaplace, F
    EURO-PAR 2000 PARALLEL PROCESSING, PROCEEDINGS, 2000, 1900 : 340 - 348
  • [35] UPCBLAS: a library for parallel matrix computations in Unified Parallel C
    Gonzalez-Dominguez, Jorge
    Martin, Maria J.
    Taboada, Guillermo L.
    Tourino, Juan
    Doallo, Ramon
    Mallon, Damian A.
    Wibecan, Brian
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2012, 24 (14): : 1645 - 1667
  • [36] A fast algorithm for sparse matrix computations related to inversion
    Li, S.
    Wu, W.
    Darve, E.
    JOURNAL OF COMPUTATIONAL PHYSICS, 2013, 242 : 915 - 945
  • [37] Sparse Matrix Computations Using the Quadtree Storage Format
    Simecek, Ivan
    11TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2009), 2009, : 168 - 173
  • [38] On the use of Java']Java arrays for sparse matrix computations
    Gundersen, G
    Steihaug, T
    PARALLEL COMPUTING: SOFTWARE TECHNOLOGY, ALGORITHMS, ARCHITECTURES AND APPLICATIONS, 2004, 13 : 119 - 126
  • [39] Pattern-Aware Vectorization for Sparse Matrix Computations
    Abdelaal, Khaled
    2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 1026 - 1026
  • [40] ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism
    Cheshmi, Kazem
    Kamil, Shoaib
    Strout, Michelle Mills
    Dehnavi, Maryam Mehri
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18), 2018,