Efficient Exploitation of Hyper Loop Parallelism in Vectorization

被引:0
|
作者
Xu, Shixiong [2 ]
Gregg, David [1 ,2 ]
机构
[1] Univ Dublin, Trinity Coll, Lero, Dublin, Ireland
[2] Univ Dublin Trinity Coll, Dept Comp Sci, Software Tools Grp, Dublin, Ireland
关键词
Hyper loop parallelism; Automatic vectorization; Global SIMD lane-wise optimization; SIMD;
D O I
10.1007/978-3-319-17473-0_25
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Modern processors can provide large amounts of processing power with vector SIMD units if the compiler or programmer can vectorize their code. With the advance of SIMD support in commodity processors, more and more advanced features are introduced, such as flexible SIMD lane-wise operations (e.g. blend instructions). However, existing vectorizing techniques fail to apply global SIMD lane-wise optimization due to the unawareness of the computation structure of the vectorizable loop. In this paper, we put forward an approach to automatic vectorization based on hyper loop parallelism, which is exposed by hyper loops. Hyper loops recover the loop structures of the vectorizable loop and help vectorization to apply global SIMD lane-wise optimization. We implemented our vectorizing technique in the Cetus source-to-source compiler to generate C code with SIMD intrinsics. The preliminary experimental results show that our vectorizing technique can achieve significant speedups up over the non-vectorized code in our test cases.
引用
收藏
页码:382 / 396
页数:15
相关论文
共 50 条
  • [41] Exploiting loop parallelism with redundant execution
    Fudan Univ, Shanghai, China
    J Comput Sci Technol, 2 (105-112):
  • [42] ReLooper: Refactoring for loop parallelism in Java
    University of Illinois, United States
    不详
    Proc Conf Object Orient Program Syst Lang Appl OOPSLA, 1600, (793-794):
  • [43] Loop tiling for optimization of locality and parallelism
    Liu, Song
    Wu, Weiguo
    Zhao, Bo
    Jiang, Qing
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (05): : 1160 - 1176
  • [44] Exploiting loop parallelism with redundant execution
    Weiyu Tang
    Wu Shi
    Binyu Zang
    Chuanqi Zhu
    Journal of Computer Science and Technology, 1997, 12 (2) : 105 - 112
  • [45] Vectorization-Aware Loop Unrolling with Seed Forwarding
    Rocha, Rodrigo C. O.
    Porpodas, Vasileios
    Petoumenos, Pavlos
    Goes, Luis F. W.
    Wang, Zheng
    Cole, Murray
    Leather, Hugh
    PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION (CC '20), 2020, : 1 - 13
  • [46] Using Data Dependence Analysis and Loop Transformations to Teach Vectorization
    Watkinson, Neftali
    Shivam, Aniket
    Chen, Zhi
    Veidenbaum, Alexander
    Nicolau, Alexandru
    PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 1143 - 1148
  • [47] HiHGNN: Accelerating HGNNs Through Parallelism and Data Reusability Exploitation
    Xue, Runzhen
    Han, Dengke
    Yan, Mingyu
    Zou, Mo
    Yang, Xiaocheng
    Wang, Duo
    Li, Wenming
    Tang, Zhimin
    Kim, John
    Ye, Xiaochun
    Fan, Dongrui
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (07) : 1122 - 1138
  • [48] Systematic exploitation of data parallelism in hardware synthesis of DSP applications
    Sen, M
    Bhattacharyya, SS
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 229 - 232
  • [49] Leveraging MLIR for Loop Vectorization and GPU Porting of FFT Libraries
    He, Yifei
    Podobas, Artur
    Markidis, Stefano
    EURO-PAR 2023: PARALLEL PROCESSING WORKSHOPS, PT I, EURO-PAR 2023, 2024, 14351 : 207 - 218
  • [50] Loop-Oriented Pointer Analysis for Automatic SIMD Vectorization
    Sui, Yulei
    Fan, Xiaokang
    Zhou, Hao
    Xue, Jingling
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2018, 17 (02)