Data Transformations Enabling Loop Vectorization on Multithreaded Data Parallel Architectures

被引:3
|
作者
Jang, Byunghyun [1 ]
Mistry, Perhaad [1 ]
Schaa, Dana [1 ]
Dominguez, Rodrigo [1 ]
Kaeli, David [1 ]
机构
[1] Northeastern Univ, Dept ECE, Boston, MA 02115 USA
关键词
Algorithms; Performance; Experimentation; Loop Vectorization; Data Transformation; GPGPU;
D O I
10.1145/1837853.1693510
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memory access patterns in the data stream. This paper describes data transformations that allow us to vectorize loops targeting massively multithreaded data parallel architectures. We present a mathematical model that captures loop-based memory access patterns and computes the most appropriate data transformations in order to enable vectorization. Our experimental results show that the proposed data transformations can significantly increase the number of loops that can be vectorized and enhance the data-level parallelism of applications. Our results also show that the overhead associated with our data transformations can be easily amortized as the size of the input data set increases. For the set of high performance benchmark kernels studied, we achieve consistent and significant performance improvements (up to 11.4X) by applying vectorization using our data transformation approach.
引用
收藏
页码:353 / 354
页数:2
相关论文
共 50 条
  • [41] Data-Parallel Hashing Techniques for GPU Architectures
    Lessley, Brenton
    Childs, Hank
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (01) : 237 - 250
  • [42] A multithreaded runtime environment with thread migration for a HPF data-parallel compiler
    Bouge, L
    Hatcher, P
    Namyst, R
    Perez, C
    1998 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 1998, : 418 - 425
  • [43] PARALLEL PERFORMANCE AND ENERGY EFFICIENCY OF MODERN VIDEO ENCODERS ON MULTITHREADED ARCHITECTURES
    Rodriguez-Sanchez, R.
    Igual, F. D.
    Martinez, J. L.
    Mayo, R.
    Quintana-Orti, E. S.
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 191 - 195
  • [44] Improving cache locality by a combination of loop and data transformations
    Kandemir, M
    Ramanujam, J
    Choudhary, A
    IEEE TRANSACTIONS ON COMPUTERS, 1999, 48 (02) : 159 - 167
  • [45] Data speculative multithreaded architecture
    Marcuello, P
    Gonzalez, A
    24TH EUROMICRO CONFERENCE - PROCEEDING, VOLS 1 AND 2, 1998, : 321 - 324
  • [46] Optically-Interconnected Data Center Architectures, Systems, and Enabling Technologies
    Ben Yoo, S.J.
    OECC/PSC 2019 - 24th OptoElectronics and Communications Conference/International Conference Photonics in Switching and Computing 2019, 2019,
  • [47] Optically-Interconnected Data Center Architectures, Systems, and Enabling Technologies
    Ben Yoo, S. J.
    2019 24TH OPTOELECTRONICS AND COMMUNICATIONS CONFERENCE (OECC) AND 2019 INTERNATIONAL CONFERENCE ON PHOTONICS IN SWITCHING AND COMPUTING (PSC), 2019,
  • [48] Data Recomputation for Multithreaded Applications
    Akbulut, Gulsum Gudukbay
    Kandemir, Mahmut T.
    Karakoy, Mustafa
    Choi, Wonil
    2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
  • [49] Loop transformations for architectures with partitioned register banks
    Huang, XL
    Carr, S
    Sweany, P
    ACM SIGPLAN NOTICES, 2001, 36 (08) : 48 - 55
  • [50] DATA-STRUCTURES FOR NETWORK ALGORITHMS ON MASSIVELY PARALLEL ARCHITECTURES
    NIELSEN, SS
    ZENIOS, SA
    PARALLEL COMPUTING, 1992, 18 (09) : 1033 - 1052