Compiler-Assisted Compaction/Restoration of SIMD Instructions

被引:1
|
作者
Cebrian, Juan M. [1 ]
Balem, Thibaud [2 ]
Barredo, Adrian [3 ]
Casas, Marc [3 ]
Moreto, Miquel [3 ]
Ros, Alberto [1 ]
Jimborean, Alexandra [1 ]
机构
[1] Univ Murcia, Comp Engn Dept, E-30100 Murcia, Spain
[2] ENS Rennes, F-35170 Rennes, France
[3] Barcelona Supercomp Ctr, Barcelona 08034, Spain
基金
欧洲研究理事会; 欧盟第七框架计划;
关键词
Registers; Parallel processing; Hardware; Computer architecture; Out of order; Delays; Energy consumption; SIMD; predication; LLVM; density-time performance;
D O I
10.1109/TPDS.2021.3091015
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Vector processors (e.g., SIMD or GPUs) are ubiquitous in high performance systems. All the supercomputers in the world exploit data-level parallelism (DLP), for example by using single instructions to operate over several data elements. Improving vector processing is therefore key for exascale computing. However, despite its potential, vector code generation and execution have significant challenges. Among these challenges, control flow divergence is one of the main performance limiting factors. Most modern vector instruction sets, including SIMD, rely on predication to support divergence control. Nevertheless, the performance and energy consumption in predicated codes is usually insensitive to the number of active elements in a predicated mask. Since the trend is that vector register size increases, the energy efficiency of exascale computing systems will become sub-optimal. This article proposes a novel approach to improve execution efficiency in predicated vector codes, the Compiler-Assisted Compaction/Restoration (CACR) technique. Baseline CR delays predicated SIMD instructions with inactive elements, compacting active elements from instances of the same instruction of consecutive loop iterations. Compacted elements form an equivalent dense vector instruction. After executing the dense instructions, their results are restored to the original instructions. However, CR has a significant performance and energy penalty when it fails to find active elements, either due to lack of resources when unrolling or because of inter-loop dependencies. In CACR, the compiler analyzes the code looking for key information required to configure CR. Then, it passes this information to the processor via new instructions inserted in the code. This prevents CR from waiting for active elements on scenarios when it would fail to form dense instructions. Simulated results (gem5) show that CACR improves performance by up to 29 percent and reduces dynamic energy by up to 24.2 percent on average, for a a set of applications with predicated execution. The baseline CR only achieves 18.6 percent performance and 14 percent energy improvements for the same configuration and applications.
引用
收藏
页码:779 / 791
页数:13
相关论文
共 50 条
  • [1] Improving Predication Efficiency through Compaction/Restoration of SIMD Instructions
    Barredo, Adrian
    Cebrian, Juan M.
    Moreto, Miguel
    Casas, Marc
    Valero, Mateo
    2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020), 2020, : 717 - 728
  • [2] COMPILER-ASSISTED FULL CHECKPOINTING
    LI, CCJ
    STEWART, EM
    FUCHS, WK
    SOFTWARE-PRACTICE & EXPERIENCE, 1994, 24 (10): : 871 - 886
  • [3] Compiler optimizations for processors with SIMD instructions
    Pryanishnikov, Ivan
    Krall, Andreas
    Horspool, Nigel
    SOFTWARE-PRACTICE & EXPERIENCE, 2007, 37 (01): : 93 - 113
  • [4] Compiler-assisted performance tuning
    Chen, Chun
    Chame, Jacqueline
    Nelson, Yoonju Lee
    Diniz, Pedro
    Hall, Mary
    Lucas, Robert
    SCIDAC 2007: SCIENTIFIC DISCOVERY THROUGH ADVANCED COMPUTING, 2007, 78
  • [5] Compiler-assisted Code Randomization
    Koo, Hyungjoon
    Chen, Yaohui
    Lu, Long
    Kemerlis, Vasileios P.
    Polychronakis, Michalis
    2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2018, : 461 - 477
  • [6] Compiler-assisted heterogeneous checkpointing
    Karablieh, F
    Bazzi, RA
    Hicks, M
    20TH IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 2001, : 56 - 65
  • [7] CALI: Compiler-Assisted Library Isolation
    Bauer, Markus
    Rossow, Christian
    ASIA CCS'21: PROCEEDINGS OF THE 2021 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2021, : 550 - 564
  • [8] Compiler-Assisted Object Inlining with Value Fields
    Bruno, Rodrigo
    Jovanovic, Vojin
    Wimmer, Christian
    Alonso, Gustavo
    PROCEEDINGS OF THE 42ND ACM SIGPLAN INTERNATIONAL CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '21), 2021, : 128 - 141
  • [9] Microcontroller Compiler-Assisted Software Fault Tolerance
    Bohman, Matthew
    James, Benjamin
    Wirthlin, Michael J.
    Quinn, Heather
    Goeders, Jeffrey
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2019, 66 (01) : 223 - 232
  • [10] Compiler-Assisted Test Acceleration Using GPUs
    Yaneva, Vanya
    PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING - COMPANION (ICSE-COMPANION, 2018, : 521 - 523