Data Transformations Enabling Loop Vectorization on Multithreaded Data Parallel Architectures

被引：3

作者：

Jang, Byunghyun ^{[1
]}

Mistry, Perhaad ^{[1
]}

Schaa, Dana ^{[1
]}

Dominguez, Rodrigo ^{[1
]}

Kaeli, David ^{[1
]}

机构：

[1] Northeastern Univ, Dept ECE, Boston, MA 02115 USA

来源：

ACM SIGPLAN NOTICES | 2010年 / 45卷 / 05期

关键词：

Algorithms; Performance; Experimentation; Loop Vectorization; Data Transformation; GPGPU;

D O I：

10.1145/1837853.1693510

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memory access patterns in the data stream. This paper describes data transformations that allow us to vectorize loops targeting massively multithreaded data parallel architectures. We present a mathematical model that captures loop-based memory access patterns and computes the most appropriate data transformations in order to enable vectorization. Our experimental results show that the proposed data transformations can significantly increase the number of loops that can be vectorized and enhance the data-level parallelism of applications. Our results also show that the overhead associated with our data transformations can be easily amortized as the size of the input data set increases. For the set of high performance benchmark kernels studied, we achieve consistent and significant performance improvements (up to 11.4X) by applying vectorization using our data transformation approach.

引用

页码：353 / 354

页数：2

共 50 条

[21] Outer-Loop Vectorization - Revisited for Short SIMD Architectures
Nuzman, Dorit
Zaks, Ayal
PACT'08: PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2008, : 2 - 11
[22] Efficient Data Supply for Parallel Heterogeneous Architectures
Ham, Tae Jun
Aragon, Juan L.
Martonosi, Margaret
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2019, 16 (02)
[23] Universal mechanisms for data-parallel architectures
Sankaralingam, K
Keckler, SW
Mark, WR
Burger, D
36TH INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, PROCEEDINGS, 2003, : 303 - 314
[24] OPTIMAL EXPRESSION EVALUATION FOR DATA PARALLEL ARCHITECTURES
GILBERT, JR
SCHREIBER, R
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1991, 13 (01) : 58 - 64
[25] Message Passing on Data-Parallel Architectures
Stuart, Jeff A.
Owens, John D.
2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 918 - +
[26] Convergence and Scalarization for Data-Parallel Architectures
Lee, Yunsup
Krashinsky, Ronny
Grover, Vinod
Keckler, Stephen W.
Asanovic, Krste
PROCEEDINGS OF THE 2013 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2013, : 182 - 192
[27] Scatter-add in data parallel architectures
Ahn, JH
Erez, M
Dally, WJ
11TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2005, : 132 - 142
[28] Loop and Data Transformations for Sparse Matrix Code
Venkat, Anand
Hall, Mary
Strout, Michelle
ACM SIGPLAN NOTICES, 2015, 50 (06) : 521 - 532
[29] Integrating loop and data transformations for global optimisation
O'Boyle, MFP
Knijnenburg, PMW
1998 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 1998, : 12 - 19
[30] Integrating loop and data transformations for global optimization
O'Boyle, MFP
Knijnenburg, PMW
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2002, 62 (04) : 563 - 590

← 1 2 3 4 5 →