Efficient GPU Implementation of Affine Index Permutations on Arrays

被引:0
|
作者
Bouverot-Dupuis, Mathis [1 ]
Sheeran, Mary [2 ]
机构
[1] ENS Paris, Paris, France
[2] Chalmers Univ, Gothenburg, Sweden
基金
瑞典研究理事会;
关键词
GPU; data-parallelism; functional languages; ALGORITHMS;
D O I
10.1145/3609024.3609411
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels of speed comparable to that of a simple array copy. This is a first step towards implementing a set of array combinators based on these permutations.
引用
收藏
页码:15 / 28
页数:14
相关论文
共 50 条
  • [41] An Efficient Implementation of Fuzzy Edge Detection using GPU in MATLAB
    Hoseini, Farnaz
    Shahbahrami, Asadollah
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2015), 2015, : 605 - 610
  • [42] MULTIDIMENSIONAL DATAFLOW GRAPH MODELING AND MAPPING FOR EFFICIENT GPU IMPLEMENTATION
    Wang, Lai-Huei
    Shen, Chung-Ching
    Seetharaman, Gunasekaran
    Palaniappan, Kannappan
    Bhattacharyya, Shuvra S.
    2012 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2012, : 300 - 305
  • [43] An efficient approach for generating pencil filter and its implementation on GPU
    Me, Dang-en
    Zhao, Yang
    Xu, Dan
    PROCEEDINGS OF 2007 10TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN AND COMPUTER GRAPHICS, 2007, : 185 - +
  • [44] A Efficient Parallel Deblocking Filter Based on GPU: Implementation and Optimization
    Su, Huayou
    Zhang, Chunyuan
    Chai, Jun
    Yang, Qianming
    2011 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2011, : 280 - 285
  • [45] An efficient implementation of the Memory Improved Proportionate Affine Projection Algorithm
    Glentis, George-Othon
    SIGNAL PROCESSING, 2016, 118 : 25 - 35
  • [46] A HARDWARE-EFFICIENT IMPLEMENTATION OF THE FAST AFFINE PROJECTION ALGORITHM
    Lo, Haw-Jing
    Anderson, David V.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 581 - 584
  • [47] An efficient Gray code algorithm for generating all permutations with a given major index
    Vajnovszki, Vincent
    JOURNAL OF DISCRETE ALGORITHMS, 2014, 26 : 77 - 88
  • [48] AN EFFICIENT REPRESENTATION FOR PERMUTATIONS
    Mihnea, Amy
    ELECTRONIC PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2013,
  • [49] APN permutations on Zn and Costas arrays
    Drakakis, Konstantinos
    Gow, Rod
    McGuire, Gary
    DISCRETE APPLIED MATHEMATICS, 2009, 157 (15) : 3320 - 3326
  • [50] PARALLEL GENERATION OF PERMUTATIONS ON SYSTOLIC ARRAYS
    LIN, CJ
    PARALLEL COMPUTING, 1990, 15 (1-3) : 267 - 276