Efficient Exploitation of Hyper Loop Parallelism in Vectorization

被引：0

作者：

Xu, Shixiong ^{[2
]}

Gregg, David ^{[1
,2
]}

机构：

[1] Univ Dublin, Trinity Coll, Lero, Dublin, Ireland

[2] Univ Dublin Trinity Coll, Dept Comp Sci, Software Tools Grp, Dublin, Ireland

来源：

LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING (LCPC 2014) | 2015年 / 8967卷

关键词：

Hyper loop parallelism; Automatic vectorization; Global SIMD lane-wise optimization; SIMD;

D O I：

10.1007/978-3-319-17473-0_25

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Modern processors can provide large amounts of processing power with vector SIMD units if the compiler or programmer can vectorize their code. With the advance of SIMD support in commodity processors, more and more advanced features are introduced, such as flexible SIMD lane-wise operations (e.g. blend instructions). However, existing vectorizing techniques fail to apply global SIMD lane-wise optimization due to the unawareness of the computation structure of the vectorizable loop. In this paper, we put forward an approach to automatic vectorization based on hyper loop parallelism, which is exposed by hyper loops. Hyper loops recover the loop structures of the vectorizable loop and help vectorization to apply global SIMD lane-wise optimization. We implemented our vectorizing technique in the Cetus source-to-source compiler to generate C code with SIMD intrinsics. The preliminary experimental results show that our vectorizing technique can achieve significant speedups up over the non-vectorized code in our test cases.

引用

页码：382 / 396

页数：15

共 50 条

[41] Exploiting loop parallelism with redundant execution
Fudan Univ, Shanghai, China
J Comput Sci Technol, 2 (105-112):
[42] ReLooper: Refactoring for loop parallelism in Java
University of Illinois, United States
不详
Proc Conf Object Orient Program Syst Lang Appl OOPSLA, 1600, (793-794):
[43] Loop tiling for optimization of locality and parallelism
Liu, Song
Wu, Weiguo
Zhao, Bo
Jiang, Qing
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (05): : 1160 - 1176
[44] Exploiting loop parallelism with redundant execution
Weiyu Tang
Wu Shi
Binyu Zang
Chuanqi Zhu
Journal of Computer Science and Technology, 1997, 12 (2) : 105 - 112
[45] Vectorization-Aware Loop Unrolling with Seed Forwarding
Rocha, Rodrigo C. O.
Porpodas, Vasileios
Petoumenos, Pavlos
Goes, Luis F. W.
Wang, Zheng
Cole, Murray
Leather, Hugh
PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION (CC '20), 2020, : 1 - 13
[46] Using Data Dependence Analysis and Loop Transformations to Teach Vectorization
Watkinson, Neftali
Shivam, Aniket
Chen, Zhi
Veidenbaum, Alexander
Nicolau, Alexandru
PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 1143 - 1148
[47] HiHGNN: Accelerating HGNNs Through Parallelism and Data Reusability Exploitation
Xue, Runzhen
Han, Dengke
Yan, Mingyu
Zou, Mo
Yang, Xiaocheng
Wang, Duo
Li, Wenming
Tang, Zhimin
Kim, John
Ye, Xiaochun
Fan, Dongrui
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (07) : 1122 - 1138
[48] Systematic exploitation of data parallelism in hardware synthesis of DSP applications
Sen, M
Bhattacharyya, SS
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 229 - 232
[49] Leveraging MLIR for Loop Vectorization and GPU Porting of FFT Libraries
He, Yifei
Podobas, Artur
Markidis, Stefano
EURO-PAR 2023: PARALLEL PROCESSING WORKSHOPS, PT I, EURO-PAR 2023, 2024, 14351 : 207 - 218
[50] Loop-Oriented Pointer Analysis for Automatic SIMD Vectorization
Sui, Yulei
Fan, Xiaokang
Zhou, Hao
Xue, Jingling
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2018, 17 (02)

← 1 2 3 4 5 →