Data Transformations Enabling Loop Vectorization on Multithreaded Data Parallel Architectures

被引:3
|
作者
Jang, Byunghyun [1 ]
Mistry, Perhaad [1 ]
Schaa, Dana [1 ]
Dominguez, Rodrigo [1 ]
Kaeli, David [1 ]
机构
[1] Northeastern Univ, Dept ECE, Boston, MA 02115 USA
关键词
Algorithms; Performance; Experimentation; Loop Vectorization; Data Transformation; GPGPU;
D O I
10.1145/1837853.1693510
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memory access patterns in the data stream. This paper describes data transformations that allow us to vectorize loops targeting massively multithreaded data parallel architectures. We present a mathematical model that captures loop-based memory access patterns and computes the most appropriate data transformations in order to enable vectorization. Our experimental results show that the proposed data transformations can significantly increase the number of loops that can be vectorized and enhance the data-level parallelism of applications. Our results also show that the overhead associated with our data transformations can be easily amortized as the size of the input data set increases. For the set of high performance benchmark kernels studied, we achieve consistent and significant performance improvements (up to 11.4X) by applying vectorization using our data transformation approach.
引用
收藏
页码:353 / 354
页数:2
相关论文
共 50 条
  • [1] Data Transformations Enabling Loop Vectorization on Multithreaded Data Parallel Architectures
    Jang, Byunghyun
    Mistry, Perhaad
    Schaa, Dana
    Dominguez, Rodrigo
    Kaeli, David
    PPOPP 2010: PROCEEDINGS OF THE 2010 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2010, : 353 - 354
  • [2] Using Data Dependence Analysis and Loop Transformations to Teach Vectorization
    Watkinson, Neftali
    Shivam, Aniket
    Chen, Zhi
    Veidenbaum, Alexander
    Nicolau, Alexandru
    PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 1143 - 1148
  • [3] Validation of Loop Parallelization and Loop Vectorization Transformations
    Dutta, Sudakshina
    Sarkar, Dipankar
    Rawat, Arvind
    Singh, Kulwant
    ENASE: PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON EVALUATION OF NOVEL SOFTWARE APPROACHES TO SOFTWARE ENGINEERING, 2016, : 195 - 202
  • [4] Data Layout Transformation for Structure Vectorization on SIMD Architectures
    Li, Peng-yuan
    Zhang, Qing-hua
    Zhao, Rong-cai
    Yu, Hai-ning
    2015 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2015, : 403 - 409
  • [5] Multithreaded data transfer over parallel links
    Sang, J
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, PROCEEDINGS, 1999, : 2403 - 2409
  • [6] Parallel Inverse Kinematics for Multithreaded Architectures
    Harish, Pawan
    Mahmudi, Mentar
    Le Callennec, Benoit
    Boulic, Ronan
    ACM TRANSACTIONS ON GRAPHICS, 2016, 35 (02):
  • [7] Power exploration of parallel embedded architectures implementing data-reuse transformations
    Kavvadias, N
    Zanikopoulos, A
    Voliotidis, C
    Kougia, S
    Chatzigeorgiou, A
    Zervas, N
    Nikolaidis, S
    ICECS 2001: 8TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS I-III, CONFERENCE PROCEEDINGS, 2001, : 781 - 784
  • [8] DATA COMMUNICATION IN PARALLEL ARCHITECTURES
    SAAD, Y
    SCHULTZ, MH
    PARALLEL COMPUTING, 1989, 11 (02) : 131 - 150
  • [9] Impact of data distribution on performance of irregular reductions on multithreaded architectures
    Zoppetti, G
    Agrawal, G
    Kumar, R
    HIGH-PERFORMANCE COMPUTING AND NETWORKING, 2001, 2110 : 483 - 492
  • [10] SPECIAL ISSUE ON DATA-FLOW AND MULTITHREADED ARCHITECTURES - INTRODUCTION
    GAO, G
    GAUDIOT, JL
    BIC, L
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1993, 18 (03) : 271 - 272