Morph Algorithms on GPUs

被引:54
|
作者
Nasre, Rupesh [1 ]
Burtscher, Martin [2 ]
Pingali, Keshav [1 ,3 ]
机构
[1] Univ Texas Austin, Inst Computat Engn & Sci, Austin, TX 78712 USA
[2] SW Texas State Univ, Dept Comp Sci, San Marcos, TX 78666 USA
[3] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
Algorithms; Languages; Performance; Morph Algorithms; Graph Algorithms; Irregular Programs; GPU; CUDA; Delaunay Mesh Refinement; Survey Propagation; Minimum Spanning Tree; Boruvka; Points-to Analysis; GRAPH ALGORITHMS; PARALLELISM; CUDA;
D O I
10.1145/2517327.2442531
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
There is growing interest in using GPUs to accelerate graph algorithms such as breadth-first search, computing page-ranks, and finding shortest paths. However, these algorithms do not modify the graph structure, so their implementation is relatively easy compared to general graph algorithms like mesh generation and refinement, which morph the underlying graph in non-trivial ways by adding and removing nodes and edges. We know relatively little about how to implement morph algorithms efficiently on GPUs. In this paper, we present and study four morph algorithms: (i) a computational geometry algorithm called Delaunay Mesh Refinement (DMR), (ii) an approximate SAT solver called Survey Propagation (SP), (iii) a compiler analysis called Points-to Analysis (PTA), and (iv) Boruvka's Minimum Spanning Tree algorithm (MST). Each of these algorithms modifies the graph data structure in different ways and thus poses interesting challenges. We overcome these challenges using algorithmic and GPU-specific optimizations. We propose efficient techniques to perform concurrent subgraph addition, subgraph deletion, conflict detection and several optimizations to improve the scalability of morph algorithms. For an input mesh with 10 million triangles, our DMR code achieves an 80x speedup over the highly optimized serial Triangle program and a 2.3x speedup over a multicore implementation running with 48 threads. Our SP code is 3x faster than a multicore implementation with 48 threads on an input with 1 million literals. The PTA implementation is able to analyze six SPEC 2000 benchmark programs in just 74 milliseconds, achieving a geometric mean speedup of 9.3x over a 48-thread multicore version. Our MST code is slower than a multicore version with 48 threads for sparse graphs but significantly faster for denser graphs. This work provides several insights into how other morph algorithms can be efficiently implemented on GPUs.
引用
收藏
页码:147 / 156
页数:10
相关论文
共 50 条
  • [21] Optimization of Parallel Genetic Algorithms for nVidia GPUs
    Wahib, Mohamed
    Munawar, Asim
    Munetomo, Masaharu
    Akama, Kiyoshi
    2011 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2011, : 803 - 811
  • [22] Implementing QR factorization updating algorithms on GPUs
    Andrew, Robert
    Dingle, Nicholas
    PARALLEL COMPUTING, 2014, 40 (07) : 161 - 172
  • [23] Streaming algorithms for biological sequence alignment on GPUs
    Liu, Weiguo
    Schmidt, Bertil
    Voss, Gerrit
    Mueller-Wittig, Wolfgang
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2007, 18 (09) : 1270 - 1281
  • [24] Design and Performance Evaluation of Image Processing Algorithms on GPUs
    Park, In Kyu
    Singhal, Nitin
    Lee, Man Hee
    Cho, Sungdae
    Kim, Chris W.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2011, 22 (01) : 91 - 104
  • [25] Distributed Memory Graph Coloring Algorithms for Multiple GPUs
    Bogle, Ian
    Boman, Erik G.
    Devine, Karen
    Rajamanickam, Sivasankaran
    Slota, George M.
    PROCEEDINGS OF IA3 2020: 2020 IEEE/ACM 10TH WORKSHOP ON IRREGULAR APPLICATIONS: ARCHITECTURES AND ALGORITHMS (IA3), 2020, : 54 - 62
  • [26] Accelerating Elliptic Curve Digital Signature Algorithms on GPUs
    Feng, Zonghao
    Xie, Qipeng
    Luo, Qiong
    Chen, Yujie
    Li, Haoxuan
    Li, Huizhong
    Yan, Qiang
    SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
  • [27] Efficient Strategies for Graph Pattern Mining Algorithms on GPUs
    Ferraz, Samuel
    Dias, Vinicius
    Teixeira, Carlos H. C.
    Teodoro, George
    Meira Jr, Wagner
    2022 IEEE 34TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2022), 2022, : 110 - 119
  • [28] Spare Register Aware Prefetching for Graph Algorithms on GPUs
    Lakshminarayana, Nagesh B.
    Kim, Hyesoon
    2014 20TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA-20), 2014, : 614 - 625
  • [29] GBFSJ: Bloom Filter Star Join Algorithms on GPUs
    Zhou Guoliang
    Wang Guilan
    2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 2427 - 2431
  • [30] GBTL-CUDA: Graph Algorithms and Primitives for GPUs
    Zhang, Peter
    Zalewski, Marcin
    Lumsdaine, Andrew
    Misurda, Samantha
    McMillan, Scott
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 912 - 920