The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

被引:0
|
作者
Lakshminarasimhan, Kartik [1 ]
Naithani, Ajeya [1 ]
Feliu, Josue [2 ]
Eeckhout, Lieven [1 ]
机构
[1] Univ Ghent, Technol Pk 126, B-9052 Ghent, Belgium
[2] Univ Murcia, C Campus Univ,Edificio 32, Murcia 30100, Spain
基金
欧洲研究理事会;
关键词
Superscalar microarchitecture; slice-out-of-order; dynamic instruction scheduling;
D O I
10.1145/3499424
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally provide high performance in a power- and cost-efficient manner. Recently proposed slice-out-of-order (sOoO) cores identify backward slices of memory operations which they execute out-of-order with respect to the rest of the dynamic instruction stream for increased instruction-level and memory-hierarchy parallelism. Unfortunately, constructing backward slices is imprecise and hardware-inefficient, leaving performance on the table. In this article, we propose Forward Slice Core (FSC), a novel core microarchitecture that builds on a stall-on-use in-order core and extracts more instruction-level and memory-hierarchy parallelism than slice-out-of-order cores. FSC does so by identifying and steering forward slices (rather than backward slices) to dedicated in-order FIFO queues. Moreover, FSC puts load-consumers that depend on L1 D-cache misses on the side to enable younger independent load-consumers to execute faster. Finally, FSC eliminates the need for dynamic memory disambiguation by replicating store-address instructions across queues. Considering 3-wide pipeline configurations, we find that FSC improves performance by 27.1%, 21.1%, and 14.6% on average compared to Freeway, the state-of-the-art sOoO core, across SPEC CPU2017, GAP, and DaCapo, respectively, while at the same time incurring reduced hardware complexity. Compared to an OoO core, FSC reduces power consumption by 61.3% and chip area by 47%, providing a microarchitecture with high performance at low complexity.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] The Forward Slice Core Microarchitecture
    Lakshminarasimhan, Kartik
    Naithani, Ajeya
    Feliu, Josue
    Eeckhout, Lieven
    PACT '20: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2020, : 361 - 372
  • [2] High-performance low-complexity wordspotting using neural networks
    Chang, EI
    Lippmann, RP
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1997, 45 (11) : 2864 - 2870
  • [3] A low-complexity high-performance noncoherent receiver for GFSK signals
    He, Jinjin
    Cui, Jian
    Yang, Lianxing
    Wang, Zhongfeng
    PROCEEDINGS OF 2008 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-10, 2008, : 1256 - +
  • [4] Concatenated tree codes: A low-complexity, high-performance approach
    Li, P
    Wu, KY
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2001, 47 (02) : 791 - 799
  • [5] New Constructions of High-Performance Low-Complexity Convolutional Codes
    Katsiotis, Alexandros
    Rizomiliotis, Panagiotis
    Kalouptsidis, Nicholas
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2010, 58 (07) : 1961 - 1972
  • [6] Low-Complexity and High-Performance Combiners for Over the Air Computing
    Ando, Kengo
    de Abreu, Giuseppe Thadeu Freitas
    2023 IEEE 9TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING, CAMSAP, 2023, : 126 - 130
  • [7] A low-complexity, high-performance fetch unit for simultaneous multithreading processors
    Falcón, A
    Ramirez, A
    Valero, M
    10TH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2004, : 244 - 253
  • [8] Low-Complexity High-Performance Cyclic Caching for Large MISO Systems
    Salehi, MohammadJavad
    Parrinello, Emanuele
    Shariatpanahi, Seyed Pooya
    Elia, Petros
    Tolli, Antti
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2022, 21 (05) : 3263 - 3278
  • [9] Low-Complexity High-Performance Method for Calculating Arbitrary Logarithm Function
    Zhang, Yongzhen
    Zhang, Yuan
    Zhang, Yonggang
    Chen, Hui
    2022 19TH INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2022, : 63 - 64
  • [10] A low-complexity high-performance modulation code for holographic data storage
    Chen, Chi-Yun
    Chiueh, Tzi-Dar
    2007 14TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS 1-4, 2007, : 788 - 791