Adaptation of Algorithms for efficient execution on GPUs

被引:0
|
作者
Bulavintsev, Vadim G. [1 ]
Zhdanov, Dmitry D. [2 ]
机构
[1] Delft Univ Technol, Delft, Netherlands
[2] ITMO Univ, St Petersburg, Russia
来源
OPTICAL DESIGN AND TESTING XI | 2021年 / 11895卷
关键词
GPU; SIMD; control flow graph; loop optimization; DPLL; resnet;
D O I
10.1117/12.2601619
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
We propose a generalized method for adapting and optimizing algorithms for efficient execution on modern graphics processing units (GPU). The method consists of several steps. First, build a control flow graph (CFG) of the algorithm. Next, transform the CFG into a tree of loops and merge non-parallelizable loops into parallelizable ones. Finally, map the resulting loops tree to the tree of GPU computational units, unrolling the algorithm's loops as necessary for the match. The method provides a convenient and robust mental framework and strategy for GPU code optimization. We demonstrate the method by adapting a backtracking search algorithm to the GPU platform and building an optimized implementation of the ResNeXt-50 neural network.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Efficient Execution of OpenMP on GPUs
    Huber, Joseph
    Cornelius, Melanie
    Georgakoudis, Giorgis
    Tian, Shilei
    Diaz, Jose M. Monsalve
    Dinel, Kuter
    Chapman, Barbara
    Doerfert, Johannes
    CGO '22: PROCEEDINGS OF THE 2022 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2022, : 41 - 52
  • [2] Designing Efficient Sorting Algorithms for Manycore GPUs
    Satish, Nadathur
    Harris, Mark
    Garland, Michael
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 257 - +
  • [3] Optimizing Tensor Contractions in CCSD(T) for Efficient Execution on GPUs
    Kim, Jinsung
    Sukumaran-Rajam, Aravind
    Hong, Changwan
    Panyala, Ajay
    Srivastava, Rohit Kumar
    Krishnamoorthy, Sriram
    Sadayappan, P.
    INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2018), 2018, : 96 - 106
  • [4] FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs
    Bernabé López-Albelda
    Francisco M. Castro
    José M. González-Linares
    Nicolás Guil
    The Journal of Supercomputing, 2022, 78 : 43 - 71
  • [5] FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs
    Lopez-Albelda, Bernabe
    Castro, Francisco M.
    Gonzalez-Linares, Jose M.
    Guil, Nicolas
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (01): : 43 - 71
  • [6] Efficient execution on GPUs of field-based vehicular mobility models
    Perumalla, Kalyan S.
    PADS 2008: 22ND INTERNATIONAL WORKSHOP ON PRINCIPLES OF ADVANCED AND DISTRIBUTED SIMULATION, PROCEEDINGS, 2008, : 154 - 154
  • [7] Graph-Waving architecture: Efficient execution of graph applications on GPUs
    Yilmazer-Metin, Ayse
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 148 : 69 - 82
  • [8] A framework for efficient and scalable execution of domain-specific templates on GPUs
    Sundaram, Narayanan
    Raghunathan, Anand
    Chakradhar, Srimat T.
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 635 - 646
  • [9] Efficient Strategies for Graph Pattern Mining Algorithms on GPUs
    Ferraz, Samuel
    Dias, Vinicius
    Teixeira, Carlos H. C.
    Teodoro, George
    Meira Jr, Wagner
    2022 IEEE 34TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2022), 2022, : 110 - 119
  • [10] Efficient Algorithms for the Summed Area Tables Primitive on GPUs
    Chen, Peng
    Wahib, Mohamed
    Takizawa, Shinichiro
    Takano, Ryousei
    Matsuoka, Satoshi
    2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2018, : 482 - 493