Efficient Exploitation of Hyper Loop Parallelism in Vectorization

被引:0
|
作者
Xu, Shixiong [2 ]
Gregg, David [1 ,2 ]
机构
[1] Univ Dublin, Trinity Coll, Lero, Dublin, Ireland
[2] Univ Dublin Trinity Coll, Dept Comp Sci, Software Tools Grp, Dublin, Ireland
关键词
Hyper loop parallelism; Automatic vectorization; Global SIMD lane-wise optimization; SIMD;
D O I
10.1007/978-3-319-17473-0_25
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Modern processors can provide large amounts of processing power with vector SIMD units if the compiler or programmer can vectorize their code. With the advance of SIMD support in commodity processors, more and more advanced features are introduced, such as flexible SIMD lane-wise operations (e.g. blend instructions). However, existing vectorizing techniques fail to apply global SIMD lane-wise optimization due to the unawareness of the computation structure of the vectorizable loop. In this paper, we put forward an approach to automatic vectorization based on hyper loop parallelism, which is exposed by hyper loops. Hyper loops recover the loop structures of the vectorizable loop and help vectorization to apply global SIMD lane-wise optimization. We implemented our vectorizing technique in the Cetus source-to-source compiler to generate C code with SIMD intrinsics. The preliminary experimental results show that our vectorizing technique can achieve significant speedups up over the non-vectorized code in our test cases.
引用
收藏
页码:382 / 396
页数:15
相关论文
共 50 条
  • [21] Exploitation of parallelism to nested loops with dependence cycles
    Chang, WL
    Chu, CP
    Ho, M
    JOURNAL OF SYSTEMS ARCHITECTURE, 2004, 50 (12) : 729 - 742
  • [22] A framework for orthogonal data and control parallelism exploitation
    Campa, S
    Danelutto, M
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2004, PT 2, 2004, 3044 : 206 - 215
  • [23] Function/Kernel Vectorization via Loop Vectorizer
    Masten, Matt
    Tyurin, Evgeniy
    Mitropoulou, Konstantina
    Garcia, Eric
    Saito, Hideki
    PROCEEDINGS OF LLVM-HPC 2018: IEEE/ACM 5TH WORKSHOP ON THE LLVM COMPILER INFRASTRUCTURE IN HPC (LLVM-HPC), 2018, : 39 - 48
  • [24] Systematic exploitation of parallelism in spatial interactions models
    Essah, W
    Davy, JR
    Openshaw, S
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-III, PROCEEDINGS, 1997, : 620 - 628
  • [25] The evolution of DSP architectures: Towards parallelism exploitation
    Sernec, R
    Zajc, M
    Tasic, J
    MELECON 2000: INFORMATION TECHNOLOGY AND ELECTROTECHNOLOGY FOR THE MEDITERRANEAN COUNTRIES, VOLS 1-3, PROCEEDINGS, 2000, : 782 - 785
  • [26] Worksharing Tasks: an Efficient Way to Exploit Irregular and Fine-Grained Loop Parallelism
    Maronas, Marcos
    Sala, Kevin
    Mateo, Sergi
    Ayguade, Eduard
    Beltran, Vicenc
    2019 IEEE 26TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC), 2019, : 383 - 394
  • [27] An Efficient Vectorization Scheme for Stencil Computation
    Chinese Academy of Sciences, State Key Laboratory of Computer Architecture, Institute of Computing Technology, Beijing, China
    不详
    Proc. - IEEE Int. Parallel Distrib. Process. Symp., IPDPS, (650-660):
  • [28] An Efficient Vectorization Scheme for Stencil Computation
    Li, Kun
    Yuan, Liang
    Zhang, Yunquan
    Yue, Yue
    Cao, Hang
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), 2022, : 650 - 660
  • [29] Efficient vectorization of the conjugate gradient method
    1600, Publ by Computational Mechanics Publ, Southampton, Engl
  • [30] SIMD Vectorization of Nested Loop Based on Strip Mining
    Xu, Jinlong
    Sun, Huihui
    Zhao, Rongcai
    2015 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2015, : 49 - 55