Efficient GPU Implementation of Affine Index Permutations on Arrays

被引:0
|
作者
Bouverot-Dupuis, Mathis [1 ]
Sheeran, Mary [2 ]
机构
[1] ENS Paris, Paris, France
[2] Chalmers Univ, Gothenburg, Sweden
基金
瑞典研究理事会;
关键词
GPU; data-parallelism; functional languages; ALGORITHMS;
D O I
10.1145/3609024.3609411
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels of speed comparable to that of a simple array copy. This is a first step towards implementing a set of array combinators based on these permutations.
引用
收藏
页码:15 / 28
页数:14
相关论文
共 50 条
  • [21] A Highly Efficient Implementation of I/O Functions on GPU
    Wu, Wei
    Qi, FengBin
    He, WangQuan
    Wang, ShanShan
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2378 - 2383
  • [22] Joint Sparsity with Mixed Granularity for Efficient GPU Implementation
    Guo, Chuliang
    Yan, Xingang
    Chen, Yufei
    Li, He
    Yin, Xunzhao
    Zhuo, Cheng
    PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1356 - 1359
  • [23] EFFICIENT DESIGN AND IMPLEMENTATION OF VISUAL COMPUTING ALGORITHMS ON THE GPU
    Park, In Kyu
    Singhal, Nitin
    Lee, Man Hee
    Cho, Sungdae
    2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 2321 - +
  • [24] Efficient Parallel Implementation of Morphological Operation on GPU and FPGA
    Li, Teng
    Dou, Yong
    Jiang, Jingfei
    Gao, Jing
    2014 INTERNATIONAL CONFERENCE ON SECURITY, PATTERN ANALYSIS, AND CYBERNETICS (SPAC), 2014, : 430 - 435
  • [25] Efficient GPU implementation of randomized SVD and its applications
    Struski, Lukasz
    Morkisz, Pawel
    Spurek, Przemyslaw
    Bernabeu, Samuel Rodriguez
    Trzcinski, Tomasz
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [26] Efficient dictionary learning implementation on the GPU using OpenCL
    Department of Automatic Control and Computers, University Politehnica of Bucharest, 313 Spl. Independentei, Bucharest
    060042, Romania
    UPB Sci. Bull. Ser. C Electr. Eng. Comput. Sci., 3 (39-50):
  • [27] Efficient Implementation of the Bellman-Ford Algorithm on GPU
    Nazarifard, Marjan
    Bahrepour, Davoud
    2017 IEEE 4TH INTERNATIONAL CONFERENCE ON KNOWLEDGE-BASED ENGINEERING AND INNOVATION (KBEI), 2017, : 773 - 778
  • [28] AFFINE PERMUTATIONS AND RATIONAL SLOPE PARKING FUNCTIONS
    Gorsky, Eugene
    Mazin, Mikhail
    Vazirani, Monica
    TRANSACTIONS OF THE AMERICAN MATHEMATICAL SOCIETY, 2016, 368 (12) : 8403 - 8445
  • [29] Efficient GPU Implementation of Informed-Filters for Fast Computation
    Oki, Takuro
    Miyamoto, Ryusuke
    IMAGE AND VIDEO TECHNOLOGY (PSIVT 2017), 2018, 10749 : 302 - 313
  • [30] Efficient number theoretic transform implementation on GPU for homomorphic encryption
    Özgün Özerk
    Can Elgezen
    Ahmet Can Mert
    Erdinç Öztürk
    Erkay Savaş
    The Journal of Supercomputing, 2022, 78 : 2840 - 2872