Efficient Exploitation of Hyper Loop Parallelism in Vectorization

被引:0
|
作者
Xu, Shixiong [2 ]
Gregg, David [1 ,2 ]
机构
[1] Univ Dublin, Trinity Coll, Lero, Dublin, Ireland
[2] Univ Dublin Trinity Coll, Dept Comp Sci, Software Tools Grp, Dublin, Ireland
关键词
Hyper loop parallelism; Automatic vectorization; Global SIMD lane-wise optimization; SIMD;
D O I
10.1007/978-3-319-17473-0_25
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Modern processors can provide large amounts of processing power with vector SIMD units if the compiler or programmer can vectorize their code. With the advance of SIMD support in commodity processors, more and more advanced features are introduced, such as flexible SIMD lane-wise operations (e.g. blend instructions). However, existing vectorizing techniques fail to apply global SIMD lane-wise optimization due to the unawareness of the computation structure of the vectorizable loop. In this paper, we put forward an approach to automatic vectorization based on hyper loop parallelism, which is exposed by hyper loops. Hyper loops recover the loop structures of the vectorizable loop and help vectorization to apply global SIMD lane-wise optimization. We implemented our vectorizing technique in the Cetus source-to-source compiler to generate C code with SIMD intrinsics. The preliminary experimental results show that our vectorizing technique can achieve significant speedups up over the non-vectorized code in our test cases.
引用
收藏
页码:382 / 396
页数:15
相关论文
共 50 条
  • [1] Exploiting Hyper-Loop Parallelism in Vectorization to Improve Memory Performance on CUDA GPGPU
    Xu, Shixiong
    Gregg, David
    2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 3, 2015, : 53 - 60
  • [2] Strategies for the efficient exploitation of loop-level parallelism in Java']Java
    Oliver, J
    Guitart, J
    Ayguadé, E
    Navarro, N
    Torres, J
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2001, 13 (8-9): : 663 - 680
  • [3] On the Exploitation of Loop-level Parallelism in Embedded Applications
    Kejariwal, Arun
    Veidenbaum, Alexander V.
    Nicolau, Alexandru
    Girkar, Milind
    Tian, Xinmin
    Saito, Hideki
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2009, 8 (02)
  • [4] Exploitation of instruction-level parallelism for optimal loop scheduling
    Müller, J
    Fimmel, D
    Merker, R
    EIGHTH WORKSHOP ON INTERACTION BETWEEN COMPILERS AND COMPUTER ARCHITECTURES, PROCEEDINGS, 2004, : 13 - 21
  • [5] An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs
    Xu, Shixiong
    Gregg, David
    2015 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION (PACT), 2015, : 488 - 489
  • [6] Predicting the best mapping for efficient exploitation of task and data parallelism
    Guirado, F
    Ripoll, A
    Roig, C
    Yuan, X
    Luque, E
    EURO-PAR 2003 PARALLEL PROCESSING, PROCEEDINGS, 2003, 2790 : 218 - 223
  • [7] Validation of Loop Parallelization and Loop Vectorization Transformations
    Dutta, Sudakshina
    Sarkar, Dipankar
    Rawat, Arvind
    Singh, Kulwant
    ENASE: PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON EVALUATION OF NOVEL SOFTWARE APPROACHES TO SOFTWARE ENGINEERING, 2016, : 195 - 202
  • [8] Parallelism exploitation in superscalar multiprocessing
    Lu, NP
    Chung, CP
    IEE PROCEEDINGS-COMPUTERS AND DIGITAL TECHNIQUES, 1998, 145 (04): : 255 - 264
  • [9] Stack splitting: A technique for efficient exploitation of search parallelism on share-nothing platforms
    Pontelli, Enrico
    Villaverde, Karen
    Guo, Hai-Feng
    Gupta, Gopal
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2006, 66 (10) : 1267 - 1293
  • [10] Insufficient Vectorization: A New Method to Exploit Superword Level Parallelism
    Gao, Wei
    Han, Lin
    Zhao, Rongcai
    Li, Yingying
    Liu, Jian
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (01): : 91 - 106