Efficient GPU Implementation of Affine Index Permutations on Arrays

被引：0

作者：

Bouverot-Dupuis, Mathis ^{[1
]}

Sheeran, Mary ^{[2
]}

机构：

[1] ENS Paris, Paris, France

[2] Chalmers Univ, Gothenburg, Sweden

来源：

PROCEEDINGS OF THE 11TH ACM SIGPLAN INTERNATIONAL WORKSHOP ON FUNCTIONAL HIGH-PERFORMANCE AND NUMERICAL COMPUTING, FHPNC 2023 | 2023年

基金：

瑞典研究理事会;

关键词：

GPU; data-parallelism; functional languages; ALGORITHMS;

D O I：

10.1145/3609024.3609411

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels of speed comparable to that of a simple array copy. This is a first step towards implementing a set of array combinators based on these permutations.

引用

页码：15 / 28

页数：14

共 50 条

[41] An Efficient Implementation of Fuzzy Edge Detection using GPU in MATLAB
Hoseini, Farnaz
Shahbahrami, Asadollah
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2015), 2015, : 605 - 610
[42] MULTIDIMENSIONAL DATAFLOW GRAPH MODELING AND MAPPING FOR EFFICIENT GPU IMPLEMENTATION
Wang, Lai-Huei
Shen, Chung-Ching
Seetharaman, Gunasekaran
Palaniappan, Kannappan
Bhattacharyya, Shuvra S.
2012 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2012, : 300 - 305
[43] An efficient approach for generating pencil filter and its implementation on GPU
Me, Dang-en
Zhao, Yang
Xu, Dan
PROCEEDINGS OF 2007 10TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN AND COMPUTER GRAPHICS, 2007, : 185 - +
[44] A Efficient Parallel Deblocking Filter Based on GPU: Implementation and Optimization
Su, Huayou
Zhang, Chunyuan
Chai, Jun
Yang, Qianming
2011 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2011, : 280 - 285
[45] An efficient implementation of the Memory Improved Proportionate Affine Projection Algorithm
Glentis, George-Othon
SIGNAL PROCESSING, 2016, 118 : 25 - 35
[46] A HARDWARE-EFFICIENT IMPLEMENTATION OF THE FAST AFFINE PROJECTION ALGORITHM
Lo, Haw-Jing
Anderson, David V.
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 581 - 584
[47] An efficient Gray code algorithm for generating all permutations with a given major index
Vajnovszki, Vincent
JOURNAL OF DISCRETE ALGORITHMS, 2014, 26 : 77 - 88
[48] AN EFFICIENT REPRESENTATION FOR PERMUTATIONS
Mihnea, Amy
ELECTRONIC PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2013,
[49] APN permutations on Zn and Costas arrays
Drakakis, Konstantinos
Gow, Rod
McGuire, Gary
DISCRETE APPLIED MATHEMATICS, 2009, 157 (15) : 3320 - 3326
[50] PARALLEL GENERATION OF PERMUTATIONS ON SYSTOLIC ARRAYS
LIN, CJ
PARALLEL COMPUTING, 1990, 15 (1-3) : 267 - 276

← 1 2 3 4 5 →