Efficient GPU Implementation of Affine Index Permutations on Arrays

被引:0
|
作者
Bouverot-Dupuis, Mathis [1 ]
Sheeran, Mary [2 ]
机构
[1] ENS Paris, Paris, France
[2] Chalmers Univ, Gothenburg, Sweden
基金
瑞典研究理事会;
关键词
GPU; data-parallelism; functional languages; ALGORITHMS;
D O I
10.1145/3609024.3609411
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels of speed comparable to that of a simple array copy. This is a first step towards implementing a set of array combinators based on these permutations.
引用
收藏
页码:15 / 28
页数:14
相关论文
共 50 条
  • [31] A Multi-GPU PCISPH Implementation with Efficient Memory Transfers
    Verma, Kevin
    Peng, Chong
    Szewc, Kamil
    Wille, Robert
    2018 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2018,
  • [32] Efficient GPU implementation of convolutional neural networks for speech recognition
    van den Berg, Ewout
    Brand, Daniel
    Bordawekar, Rajesh
    Rachevsky, Leonid
    Ramabhadran, Bhuvana
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1483 - 1487
  • [33] An Efficient Implementation of Ant Colony Optimization on GPU for the Satisfiability Problem
    Youness, Hassan
    Ibraheim, Aziza
    Moness, Mohammed
    Osama, Muhammad
    23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015), 2015, : 230 - 235
  • [34] PIECEWISE-AFFINE PERMUTATIONS OF FINITE FIELDS
    Bugrov, A. D.
    PRIKLADNAYA DISKRETNAYA MATEMATIKA, 2015, 30 (04): : 5 - 23
  • [35] Efficient GPU implementation of the multivariate empirical mode decomposition algorithm
    Wang, Zeyu
    Juhasz, Zoltan
    JOURNAL OF COMPUTATIONAL SCIENCE, 2023, 74
  • [36] Efficient number theoretic transform implementation on GPU for homomorphic encryption
    Ozerk, Ozgun
    Elgezen, Can
    Mert, Ahmet Can
    Ozturk, Erdinc
    Savas, Erkay
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (02): : 2840 - 2872
  • [37] An Improved Affine Equivalence Algorithm for Random Permutations
    Dinur, Itai
    ADVANCES IN CRYPTOLOGY - EUROCRYPT 2018, PT I, 2018, 10820 : 413 - 442
  • [38] An Efficient GPU Implementation of Inclusion-Based Pointer Analysis
    Su, Yu
    Ye, Ding
    Xue, Jingling
    Liao, Xiang-Ke
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (02) : 353 - 366
  • [39] Efficient GPU Implementation of Automatic Differentiation for Computational Fluid Dynamics
    Zubair, Mohammad
    Ranjan, Desh
    Walden, Aaron
    Nastac, Gabriel
    Nielsen, Eric
    Diskin, Boris
    Paterno, Marc
    Jung, Samuel
    Davis, Joshua Hoke
    2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023, 2023, : 377 - 386
  • [40] Efficient GPU Implementation of Lucas-Kanade through OpenACC
    Haggui, Olfa
    Tadonki, Claude
    Sayadi, Fatma
    Ouni, Bouraoui
    PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2019, : 768 - 775