Efficient Exploitation of Hyper Loop Parallelism in Vectorization

被引:0
|
作者
Xu, Shixiong [2 ]
Gregg, David [1 ,2 ]
机构
[1] Univ Dublin, Trinity Coll, Lero, Dublin, Ireland
[2] Univ Dublin Trinity Coll, Dept Comp Sci, Software Tools Grp, Dublin, Ireland
关键词
Hyper loop parallelism; Automatic vectorization; Global SIMD lane-wise optimization; SIMD;
D O I
10.1007/978-3-319-17473-0_25
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Modern processors can provide large amounts of processing power with vector SIMD units if the compiler or programmer can vectorize their code. With the advance of SIMD support in commodity processors, more and more advanced features are introduced, such as flexible SIMD lane-wise operations (e.g. blend instructions). However, existing vectorizing techniques fail to apply global SIMD lane-wise optimization due to the unawareness of the computation structure of the vectorizable loop. In this paper, we put forward an approach to automatic vectorization based on hyper loop parallelism, which is exposed by hyper loops. Hyper loops recover the loop structures of the vectorizable loop and help vectorization to apply global SIMD lane-wise optimization. We implemented our vectorizing technique in the Cetus source-to-source compiler to generate C code with SIMD intrinsics. The preliminary experimental results show that our vectorizing technique can achieve significant speedups up over the non-vectorized code in our test cases.
引用
收藏
页码:382 / 396
页数:15
相关论文
共 50 条
  • [31] VSkyline: Vectorization for Efficient Skyline Computation
    Cho, Sung-Ryoung
    Lee, Jongwuk
    Hwang, Seung-Won
    Han, Hwansoo
    Lee, Sang-Won
    SIGMOD RECORD, 2010, 39 (02) : 19 - 26
  • [32] Cache-efficient renumbering for vectorization
    Loehner, Rainald
    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING, 2010, 26 (05) : 628 - 636
  • [33] Architectural support for exploitation of fine-grain parallelism
    不详
    EXPLOITATION OF FINE-GRAIN PARALLELISM, 1995, 942 : 32 - 37
  • [34] NanosCompiler:: supporting flexible multilevel parallelism exploitation in OpenMP
    González, M
    Ayguadé, E
    Martorell, X
    Labarta, J
    Navarro, N
    Oliver, J
    CONCURRENCY-PRACTICE AND EXPERIENCE, 2000, 12 (12): : 1205 - 1218
  • [35] Exploiting Loop Parallelism with Redundant Execution
    唐卫宇
    施武
    臧斌宇
    朱传琪
    Journal of Computer Science and Technology, 1997, (02) : 105 - 112
  • [36] Generating Optimized Code for Parallelism Exploitation to an Unconventional Architecture
    Vieira Do Couto J.
    Roberto Fernandes De Araujo S.
    1967, IEEE Computer Society (15): : 1967 - 1976
  • [37] Automatic parallelism exploitation for FPL-based accelerators
    Becker, J
    Schmidt, K
    PROCEEDINGS OF THE THIRTY-FIRST HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, VOL VII: SOFTWARE TECHNOLOGY TRACK, 1998, : 169 - 178
  • [38] Generating Optimized Code for Parallelism Exploitation to an Unconventional Architecture
    Couto, J. V.
    Fernandes, S. R.
    IEEE LATIN AMERICA TRANSACTIONS, 2017, 15 (10) : 1967 - 1976
  • [39] EMPLOYING REGISTER CHANNELS FOR THE EXPLOITATION OF INSTRUCTION LEVEL PARALLELISM
    GUPTA, R
    SIGPLAN NOTICES, 1990, 25 (03): : 118 - 127
  • [40] Automatic exploitation of dual level parallelism on a network of multiprocessors
    Kumaran, S
    Quinn, MJ
    PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, 1996, : 616 - 625