Morph Algorithms on GPUs

被引：54

作者：

Nasre, Rupesh ^{[1
]}

Burtscher, Martin ^{[2
]}

Pingali, Keshav ^{[1
,3
]}

机构：

[1] Univ Texas Austin, Inst Computat Engn & Sci, Austin, TX 78712 USA

[2] SW Texas State Univ, Dept Comp Sci, San Marcos, TX 78666 USA

[3] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA

来源：

ACM SIGPLAN NOTICES | 2013年 / 48卷 / 08期

基金：

美国国家科学基金会;

关键词：

Algorithms; Languages; Performance; Morph Algorithms; Graph Algorithms; Irregular Programs; GPU; CUDA; Delaunay Mesh Refinement; Survey Propagation; Minimum Spanning Tree; Boruvka; Points-to Analysis; GRAPH ALGORITHMS; PARALLELISM; CUDA;

D O I：

10.1145/2517327.2442531

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

There is growing interest in using GPUs to accelerate graph algorithms such as breadth-first search, computing page-ranks, and finding shortest paths. However, these algorithms do not modify the graph structure, so their implementation is relatively easy compared to general graph algorithms like mesh generation and refinement, which morph the underlying graph in non-trivial ways by adding and removing nodes and edges. We know relatively little about how to implement morph algorithms efficiently on GPUs. In this paper, we present and study four morph algorithms: (i) a computational geometry algorithm called Delaunay Mesh Refinement (DMR), (ii) an approximate SAT solver called Survey Propagation (SP), (iii) a compiler analysis called Points-to Analysis (PTA), and (iv) Boruvka's Minimum Spanning Tree algorithm (MST). Each of these algorithms modifies the graph data structure in different ways and thus poses interesting challenges. We overcome these challenges using algorithmic and GPU-specific optimizations. We propose efficient techniques to perform concurrent subgraph addition, subgraph deletion, conflict detection and several optimizations to improve the scalability of morph algorithms. For an input mesh with 10 million triangles, our DMR code achieves an 80x speedup over the highly optimized serial Triangle program and a 2.3x speedup over a multicore implementation running with 48 threads. Our SP code is 3x faster than a multicore implementation with 48 threads on an input with 1 million literals. The PTA implementation is able to analyze six SPEC 2000 benchmark programs in just 74 milliseconds, achieving a geometric mean speedup of 9.3x over a 48-thread multicore version. Our MST code is slower than a multicore version with 48 threads for sparse graphs but significantly faster for denser graphs. This work provides several insights into how other morph algorithms can be efficiently implemented on GPUs.

引用

页码：147 / 156

页数：10

共 50 条

[1] Analysis of Classic Algorithms on GPUs
Ma, Lin
Chamberlain, Roger D.
Agrawal, Kunal
2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2014, : 65 - 73
[2] Parallel Vertex Cover Algorithms on GPUs
Yamout, Peter
Barada, Karim
Jaljuli, Adnan
Mouawad, Amer E.
El Hajj, Izzat
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), 2022, : 201 - 211
[3] Using GPUs to Accelerate CAD Algorithms
Croix, John F.
Gulati, Kanupriya
Khatri, Sunil P.
IEEE DESIGN & TEST, 2013, 30 (01) : 8 - 16
[4] Comparison of Modular Arithmetic Algorithms on GPUs
Giorgi, Pascal
Izard, Thomas
Tisserand, Arnaud
PARALLEL COMPUTING: FROM MULTICORES AND GPU'S TO PETASCALE, 2010, 19 : 315 - 322
[5] Accelerating Iris Recognition Algorithms on GPUs
Sakr, Fatma Zaky
Taher, Mohamed
El-Bialy, Ahmed M.
Wahba, Ayman M.
2012 CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE (CIBEC), 2012, : 73 - 76
[6] Using GPUs for machine learning algorithms
Steinkraus, D
Buck, I
Simard, PY
EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 1115 - 1120
[7] Performance Evaluation of Clustering Algorithms on GPUs
Morales-Garcia, Juan
Llanes, Antonio
Imbernon, Baldomero
Cecilia, Jose M.
INTELLIGENT ENVIRONMENTS 2020, 2020, 28 : 400 - 409
[8] Exploiting GPUs to Accelerate Clustering Algorithms
Al-Ayyoub, Mahmoud
Yaseen, Qussai
Shehab, Moahmmed A.
Jararweh, Yaser
Albalas, Firas
Benkhelifa, Elhadj
2016 IEEE/ACS 13TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2016,
[9] Adaptation of Algorithms for efficient execution on GPUs
Bulavintsev, Vadim G.
Zhdanov, Dmitry D.
OPTICAL DESIGN AND TESTING XI, 2021, 11895
[10] Designing Efficient Sorting Algorithms for Manycore GPUs
Satish, Nadathur
Harris, Mark
Garland, Michael
2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 257 - +

← 1 2 3 4 5 →