Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow

被引:0
|
作者
Ghiglio, Pietro [1 ]
Dolinsky, Uwe [1 ]
Goli, Mehdi [1 ]
Narasimhan, Kumudha [1 ]
机构
[1] Codeplay Software Ltd, Edinburgh, Scotland
来源
基金
“创新英国”项目;
关键词
compiler optimizations; multi-cores; parallel programming; portability; software acceleration; standards; SYCL;
D O I
10.1002/cpe.7810
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The wide adoption of SYCL as an open-standard API for accelerating C++ software in domains such as HPC, automotive, artificial intelligence, machine learning, and other areas necessitates efficient compiler and runtime support for a growing number of different platforms. Existing SYCL implementations provide support for various devices like CPUs, GPUs, DSPs, FPGAs and so forth, typically via OpenCL or CUDA backends. While accelerators have increased the performance of user applications significantly, employing CPU devices for further performance improvement is beneficial due to the significant presence of CPUs in existing data-centers. SYCL applications on CPUs, currently go through an OpenCL backend. Though an OpenCL backend is valuable in supporting accelerators, it may introduce additional overhead for CPUs since the host and device are the same. Overheads like a run-time compilation of the kernel, transferring of input/output memory to/from the OpenCL device, invoking the OpenCL kernel and so forth, may not be necessary when running on the CPU. While some of these overheads (such as data transfer) can be avoided by modifying the application, it can introduce disparity in the SYCL application's ability to achieve performance portability on other devices. In this article, we propose an alternate approach to running SYCL applications on CPUs. We bypass OpenCL and use a CPU-directed compilation flow, along with the integration of whole function vectorization to generate optimized host and device code together in the same translation unit. We compare the performance of our approach-the CPU-directed compilation flow, with an OpenCL backend for existing SYCL-based applications, with no code modification for BabelStream benchmark, Matmul from the ComputeCpp SDK, N-body simulation benchmarks and SYCL-BLAS (Aliaga et al. Proceedings of the 5th International Workshop on OpenCL; 2017.), on CPUs from different vendors and architectures. We report a performance improvement of up to 72%$$ 72\% $$ on BabelStream benchmarks, up to 63%$$ 63\% $$ on Matmul, up to 21%$$ 21\% $$ on the N-body simulation benchmark and up to 16% on SYCL-BLAS.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Directed Self-Assembly Driven Mesoscale Lithography Using Laser-Induced and Manipulated Microbubbles: Complex Architectures and Diverse Applications
    Ghosh, Subhrokoli
    Ranjan, Anand Dev
    Das, Santu
    Sen, Rakesh
    Roy, Basudev
    Roy, Soumyajit
    Banerjee, Ayan
    NANO LETTERS, 2021, 21 (01) : 10 - 25
  • [42] High-Performance Flow Classification of Big Data Using Hybrid CPU-GPU Clusters of Cloud Environments
    Fazel-Najafabadi, Azam
    Abbasi, Mahdi
    Attar, Hani H.
    Amer, Ayman
    Taherkordi, Amir
    Shokrollahi, Azad
    Khosravi, Mohammad R.
    Solyman, Ahmed A.
    TSINGHUA SCIENCE AND TECHNOLOGY, 2024, 29 (04): : 1118 - 1137
  • [43] Improving the performance of a SOW system in petroleum applications by using multiple formulation changes.
    Salager, Jean-Louis
    Forgiarini, Ana M.
    Bullon, Johnny
    JOURNAL OF THE AMERICAN OIL CHEMISTS SOCIETY, 2020, 97 : 102 - 102
  • [44] Improving Performance of the Hypre Iterative Solver for Uintah Combustion Codes on Manycore Architectures Using MPI Endpoints and Kernel Consolidation
    Sahasrabudhe, Damodar
    Berzins, Martin
    COMPUTATIONAL SCIENCE - ICCS 2020, PT I, 2020, 12137 : 175 - 190
  • [46] Improving Oxygenator Performance Using Computational Simulation and Flow Field-Based Parameters
    Graefe, Roland
    Borchardt, Ralf
    Arens, Jutta
    Schlanstein, Peter
    Schmitz-Rode, Thomas
    Steinseifer, Ulrich
    ARTIFICIAL ORGANS, 2010, 34 (11) : 930 - 936
  • [47] Improving the performance of narrow-bore HPLC columns using active flow technology
    Soliven, Arianne
    Foley, Dominic
    Pereira, Luisa
    Hua, Stanly
    Edge, Tony
    Ritchie, Harald
    Dennis, Gary R.
    Shalliker, R. Andrew
    MICROCHEMICAL JOURNAL, 2014, 116 : 230 - 234
  • [48] IMPROVING PERFORMANCE OF TRANSMISSION NETWORKS USING FACTS THROUGH CONTINUATION POWER FLOW METHOD
    Alnasseir, Jamal
    FACTA UNIVERSITATIS-SERIES ELECTRONICS AND ENERGETICS, 2022, 35 (03) : 437 - 454
  • [49] Improving the quality of herbicide applications to oil palm in Malaysia using the CFValve - a constant flow valve
    Eng, OK
    Omar, D
    McAuliffe, D
    CROP PROTECTION, 1999, 18 (09) : 605 - 607
  • [50] Using Multi-Core Architectures to Execute High Performance-Oriented Real-Time Applications
    Aussagues, C.
    Ohayon, E.
    Brifault, K.
    Dinh, Q.
    PARALLEL COMPUTING: FROM MULTICORES AND GPU'S TO PETASCALE, 2010, 19 : 677 - 684