Adaptation of Algorithms for efficient execution on GPUs

被引：0

作者：

Bulavintsev, Vadim G. ^{[1
]}

Zhdanov, Dmitry D. ^{[2
]}

机构：

[1] Delft Univ Technol, Delft, Netherlands

[2] ITMO Univ, St Petersburg, Russia

来源：

OPTICAL DESIGN AND TESTING XI | 2021年 / 11895卷

关键词：

GPU; SIMD; control flow graph; loop optimization; DPLL; resnet;

D O I：

10.1117/12.2601619

中图分类号：

O43 [光学];

学科分类号：

070207 ; 0803 ;

摘要：

We propose a generalized method for adapting and optimizing algorithms for efficient execution on modern graphics processing units (GPU). The method consists of several steps. First, build a control flow graph (CFG) of the algorithm. Next, transform the CFG into a tree of loops and merge non-parallelizable loops into parallelizable ones. Finally, map the resulting loops tree to the tree of GPU computational units, unrolling the algorithm's loops as necessary for the match. The method provides a convenient and robust mental framework and strategy for GPU code optimization. We demonstrate the method by adapting a backtracking search algorithm to the GPU platform and building an optimized implementation of the ResNeXt-50 neural network.

引用

页数：8

共 50 条

[1] Efficient Execution of OpenMP on GPUs
Huber, Joseph
Cornelius, Melanie
Georgakoudis, Giorgis
Tian, Shilei
Diaz, Jose M. Monsalve
Dinel, Kuter
Chapman, Barbara
Doerfert, Johannes
CGO '22: PROCEEDINGS OF THE 2022 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2022, : 41 - 52
[2] Designing Efficient Sorting Algorithms for Manycore GPUs
Satish, Nadathur
Harris, Mark
Garland, Michael
2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 257 - +
[3] Optimizing Tensor Contractions in CCSD(T) for Efficient Execution on GPUs
Kim, Jinsung
Sukumaran-Rajam, Aravind
Hong, Changwan
Panyala, Ajay
Srivastava, Rohit Kumar
Krishnamoorthy, Sriram
Sadayappan, P.
INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2018), 2018, : 96 - 106
[4] FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs
Bernabé López-Albelda
Francisco M. Castro
José M. González-Linares
Nicolás Guil
The Journal of Supercomputing, 2022, 78 : 43 - 71
[5] FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs
Lopez-Albelda, Bernabe
Castro, Francisco M.
Gonzalez-Linares, Jose M.
Guil, Nicolas
JOURNAL OF SUPERCOMPUTING, 2022, 78 (01): : 43 - 71
[6] Efficient execution on GPUs of field-based vehicular mobility models
Perumalla, Kalyan S.
PADS 2008: 22ND INTERNATIONAL WORKSHOP ON PRINCIPLES OF ADVANCED AND DISTRIBUTED SIMULATION, PROCEEDINGS, 2008, : 154 - 154
[7] Graph-Waving architecture: Efficient execution of graph applications on GPUs
Yilmazer-Metin, Ayse
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 148 : 69 - 82
[8] A framework for efficient and scalable execution of domain-specific templates on GPUs
Sundaram, Narayanan
Raghunathan, Anand
Chakradhar, Srimat T.
2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 635 - 646
[9] Efficient Strategies for Graph Pattern Mining Algorithms on GPUs
Ferraz, Samuel
Dias, Vinicius
Teixeira, Carlos H. C.
Teodoro, George
Meira Jr, Wagner
2022 IEEE 34TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2022), 2022, : 110 - 119
[10] Efficient Algorithms for the Summed Area Tables Primitive on GPUs
Chen, Peng
Wahib, Mohamed
Takizawa, Shinichiro
Takano, Ryousei
Matsuoka, Satoshi
2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2018, : 482 - 493

← 1 2 3 4 5 →