Data Transformations Enabling Loop Vectorization on Multithreaded Data Parallel Architectures

被引：3

作者：

Jang, Byunghyun ^{[1
]}

Mistry, Perhaad ^{[1
]}

Schaa, Dana ^{[1
]}

Dominguez, Rodrigo ^{[1
]}

Kaeli, David ^{[1
]}

机构：

[1] Northeastern Univ, Dept ECE, Boston, MA 02115 USA

来源：

ACM SIGPLAN NOTICES | 2010年 / 45卷 / 05期

关键词：

Algorithms; Performance; Experimentation; Loop Vectorization; Data Transformation; GPGPU;

D O I：

10.1145/1837853.1693510

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memory access patterns in the data stream. This paper describes data transformations that allow us to vectorize loops targeting massively multithreaded data parallel architectures. We present a mathematical model that captures loop-based memory access patterns and computes the most appropriate data transformations in order to enable vectorization. Our experimental results show that the proposed data transformations can significantly increase the number of loops that can be vectorized and enhance the data-level parallelism of applications. Our results also show that the overhead associated with our data transformations can be easily amortized as the size of the input data set increases. For the set of high performance benchmark kernels studied, we achieve consistent and significant performance improvements (up to 11.4X) by applying vectorization using our data transformation approach.

引用

页码：353 / 354

页数：2

共 50 条

[41] Data-Parallel Hashing Techniques for GPU Architectures
Lessley, Brenton
Childs, Hank
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (01) : 237 - 250
[42] A multithreaded runtime environment with thread migration for a HPF data-parallel compiler
Bouge, L
Hatcher, P
Namyst, R
Perez, C
1998 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 1998, : 418 - 425
[43] PARALLEL PERFORMANCE AND ENERGY EFFICIENCY OF MODERN VIDEO ENCODERS ON MULTITHREADED ARCHITECTURES
Rodriguez-Sanchez, R.
Igual, F. D.
Martinez, J. L.
Mayo, R.
Quintana-Orti, E. S.
2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 191 - 195
[44] Improving cache locality by a combination of loop and data transformations
Kandemir, M
Ramanujam, J
Choudhary, A
IEEE TRANSACTIONS ON COMPUTERS, 1999, 48 (02) : 159 - 167
[45] Data speculative multithreaded architecture
Marcuello, P
Gonzalez, A
24TH EUROMICRO CONFERENCE - PROCEEDING, VOLS 1 AND 2, 1998, : 321 - 324
[46] Optically-Interconnected Data Center Architectures, Systems, and Enabling Technologies
Ben Yoo, S.J.
OECC/PSC 2019 - 24th OptoElectronics and Communications Conference/International Conference Photonics in Switching and Computing 2019, 2019,
[47] Optically-Interconnected Data Center Architectures, Systems, and Enabling Technologies
Ben Yoo, S. J.
2019 24TH OPTOELECTRONICS AND COMMUNICATIONS CONFERENCE (OECC) AND 2019 INTERNATIONAL CONFERENCE ON PHOTONICS IN SWITCHING AND COMPUTING (PSC), 2019,
[48] Data Recomputation for Multithreaded Applications
Akbulut, Gulsum Gudukbay
Kandemir, Mahmut T.
Karakoy, Mustafa
Choi, Wonil
2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
[49] Loop transformations for architectures with partitioned register banks
Huang, XL
Carr, S
Sweany, P
ACM SIGPLAN NOTICES, 2001, 36 (08) : 48 - 55
[50] DATA-STRUCTURES FOR NETWORK ALGORITHMS ON MASSIVELY PARALLEL ARCHITECTURES
NIELSEN, SS
ZENIOS, SA
PARALLEL COMPUTING, 1992, 18 (09) : 1033 - 1052

← 1 2 3 4 5 →