Data Transformations Enabling Loop Vectorization on Multithreaded Data Parallel Architectures

被引：3

作者：

Jang, Byunghyun ^{[1
]}

Mistry, Perhaad ^{[1
]}

Schaa, Dana ^{[1
]}

Dominguez, Rodrigo ^{[1
]}

Kaeli, David ^{[1
]}

机构：

[1] Northeastern Univ, Dept ECE, Boston, MA 02115 USA

来源：

ACM SIGPLAN NOTICES | 2010年 / 45卷 / 05期

关键词：

Algorithms; Performance; Experimentation; Loop Vectorization; Data Transformation; GPGPU;

D O I：

10.1145/1837853.1693510

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memory access patterns in the data stream. This paper describes data transformations that allow us to vectorize loops targeting massively multithreaded data parallel architectures. We present a mathematical model that captures loop-based memory access patterns and computes the most appropriate data transformations in order to enable vectorization. Our experimental results show that the proposed data transformations can significantly increase the number of loops that can be vectorized and enhance the data-level parallelism of applications. Our results also show that the overhead associated with our data transformations can be easily amortized as the size of the input data set increases. For the set of high performance benchmark kernels studied, we achieve consistent and significant performance improvements (up to 11.4X) by applying vectorization using our data transformation approach.

引用

页码：353 / 354

页数：2

共 50 条

[31] Hybrid Data Dependence Analysis for Loop Transformations
Sampaio, Diogo
Ketterlin, Alain
Pouchet, Louis-Noel
Rastello, Fabrice
2016 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES (PACT), 2016, : 439 - 440
[32] ParaLog: Enabling and Accelerating Online Parallel Monitoring of Multithreaded Applications
Vlachos, Evangelos
Goodstein, Michelle L.
Kozuch, Michael A.
Chen, Shimin
Falsafi, Babak
Gibbons, Phillip B.
Mowry, Todd C.
ACM SIGPLAN NOTICES, 2010, 45 (03) : 271 - 283
[33] ParaLog: Enabling and accelerating online parallel monitoring of multithreaded applications
Vlachos, Evangelos
Goodstein, Michelle L.
Kozuch, Michael A.
Chen, Shimin
Falsafi, Babak
Gibbons, Phillip B.
Mowry, Todd C.
ACM SIGPLAN Notices, 2010, 45 (03): : 271 - 283
[34] Optimized Use of Parallel Programming Interfaces in Multithreaded Embedded Architectures
Lorenzon, Arthur F.
Sartor, Anderson L.
Cera, Marcia C.
Beck, Antonio Carlos Schneider
2015 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, 2015, : 410 - 415
[35] ParaLog: Enabling and Accelerating Online Parallel Monitoring of Multithreaded Applications
Vlachos, Evangelos
Goodstein, Michelle L.
Kozuch, Michael A.
Chen, Shimin
Falsafi, Babak
Gibbons, Phillip B.
Mowry, Todd C.
ASPLOS XV: FIFTEENTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2010, : 271 - 283
[36] Parametric data-parallel architectures for TLM acceleration
Chouliaras, VA
Flint, JA
Li, YB
2004 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL ELECTROMAGNETICS AND ITS APPLICATIONS, PROCEEDINGS, 2004, : 569 - 572
[37] Analysis of CT Data Using Parallel GPU Architectures
Gavrilescu, Marius
PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE AND EXPOSITION ON ELECTRICAL AND POWER ENGINEERING (EPE 2012), 2012, : 766 - 770
[38] Efficient conditional operations for data-parallel architectures
Kapasi, UJ
Dally, WJ
Rixner, S
Mattson, PR
Owens, JD
Khailany, B
33RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE: MICRO-33 2000, PROCEEDINGS, 2000, : 159 - 170
[39] Parallel/pipeline multiprocessor architectures for SAR data processing
Albrizio, Rosa, 1600, (02):
[40] Loading the data warehouse across various parallel architectures
Raghavan, V
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, 1996, : 597 - 597

← 1 2 3 4 5 →