Data Transformations Enabling Loop Vectorization on Multithreaded Data Parallel Architectures

被引:3
|
作者
Jang, Byunghyun [1 ]
Mistry, Perhaad [1 ]
Schaa, Dana [1 ]
Dominguez, Rodrigo [1 ]
Kaeli, David [1 ]
机构
[1] Northeastern Univ, Dept ECE, Boston, MA 02115 USA
关键词
Algorithms; Performance; Experimentation; Loop Vectorization; Data Transformation; GPGPU;
D O I
10.1145/1837853.1693510
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memory access patterns in the data stream. This paper describes data transformations that allow us to vectorize loops targeting massively multithreaded data parallel architectures. We present a mathematical model that captures loop-based memory access patterns and computes the most appropriate data transformations in order to enable vectorization. Our experimental results show that the proposed data transformations can significantly increase the number of loops that can be vectorized and enhance the data-level parallelism of applications. Our results also show that the overhead associated with our data transformations can be easily amortized as the size of the input data set increases. For the set of high performance benchmark kernels studied, we achieve consistent and significant performance improvements (up to 11.4X) by applying vectorization using our data transformation approach.
引用
收藏
页码:353 / 354
页数:2
相关论文
共 50 条
  • [21] Outer-Loop Vectorization - Revisited for Short SIMD Architectures
    Nuzman, Dorit
    Zaks, Ayal
    PACT'08: PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2008, : 2 - 11
  • [22] Efficient Data Supply for Parallel Heterogeneous Architectures
    Ham, Tae Jun
    Aragon, Juan L.
    Martonosi, Margaret
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2019, 16 (02)
  • [23] Universal mechanisms for data-parallel architectures
    Sankaralingam, K
    Keckler, SW
    Mark, WR
    Burger, D
    36TH INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, PROCEEDINGS, 2003, : 303 - 314
  • [24] OPTIMAL EXPRESSION EVALUATION FOR DATA PARALLEL ARCHITECTURES
    GILBERT, JR
    SCHREIBER, R
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1991, 13 (01) : 58 - 64
  • [25] Message Passing on Data-Parallel Architectures
    Stuart, Jeff A.
    Owens, John D.
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 918 - +
  • [26] Convergence and Scalarization for Data-Parallel Architectures
    Lee, Yunsup
    Krashinsky, Ronny
    Grover, Vinod
    Keckler, Stephen W.
    Asanovic, Krste
    PROCEEDINGS OF THE 2013 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2013, : 182 - 192
  • [27] Scatter-add in data parallel architectures
    Ahn, JH
    Erez, M
    Dally, WJ
    11TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2005, : 132 - 142
  • [28] Loop and Data Transformations for Sparse Matrix Code
    Venkat, Anand
    Hall, Mary
    Strout, Michelle
    ACM SIGPLAN NOTICES, 2015, 50 (06) : 521 - 532
  • [29] Integrating loop and data transformations for global optimisation
    O'Boyle, MFP
    Knijnenburg, PMW
    1998 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 1998, : 12 - 19
  • [30] Integrating loop and data transformations for global optimization
    O'Boyle, MFP
    Knijnenburg, PMW
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2002, 62 (04) : 563 - 590