The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

被引:0
|
作者
Lakshminarasimhan, Kartik [1 ]
Naithani, Ajeya [1 ]
Feliu, Josue [2 ]
Eeckhout, Lieven [1 ]
机构
[1] Univ Ghent, Technol Pk 126, B-9052 Ghent, Belgium
[2] Univ Murcia, C Campus Univ,Edificio 32, Murcia 30100, Spain
基金
欧洲研究理事会;
关键词
Superscalar microarchitecture; slice-out-of-order; dynamic instruction scheduling;
D O I
10.1145/3499424
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally provide high performance in a power- and cost-efficient manner. Recently proposed slice-out-of-order (sOoO) cores identify backward slices of memory operations which they execute out-of-order with respect to the rest of the dynamic instruction stream for increased instruction-level and memory-hierarchy parallelism. Unfortunately, constructing backward slices is imprecise and hardware-inefficient, leaving performance on the table. In this article, we propose Forward Slice Core (FSC), a novel core microarchitecture that builds on a stall-on-use in-order core and extracts more instruction-level and memory-hierarchy parallelism than slice-out-of-order cores. FSC does so by identifying and steering forward slices (rather than backward slices) to dedicated in-order FIFO queues. Moreover, FSC puts load-consumers that depend on L1 D-cache misses on the side to enable younger independent load-consumers to execute faster. Finally, FSC eliminates the need for dynamic memory disambiguation by replicating store-address instructions across queues. Considering 3-wide pipeline configurations, we find that FSC improves performance by 27.1%, 21.1%, and 14.6% on average compared to Freeway, the state-of-the-art sOoO core, across SPEC CPU2017, GAP, and DaCapo, respectively, while at the same time incurring reduced hardware complexity. Compared to an OoO core, FSC reduces power consumption by 61.3% and chip area by 47%, providing a microarchitecture with high performance at low complexity.
引用
收藏
页数:25
相关论文
共 50 条
  • [31] A new low-complexity demapper for high-performance iterative MIMO: Information-theoretic and BER analyses
    Koshy, JC
    Liberti, JC
    Hoerning, TR
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1029 - 1032
  • [32] High-performance low-complexity bit-plane coding scheme for MPEG-4 FGS
    Chao, HY
    Wang, JS
    Lin, JL
    Yang, KC
    Wu, CM
    Huang, CM
    Van, LD
    2005 IEEE International Conference on Multimedia and Expo (ICME), Vols 1 and 2, 2005, : 89 - 92
  • [33] A 14-band low-complexity and high-performance synthesizer architecture for MB-OFDM communication
    Traverso, Sylvain
    Ariaudo, Myriam
    Gautier, Jean-Luc
    Fijalkow, Inbar
    Lereau, Christian
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2007, 54 (06) : 552 - 556
  • [34] A High-Performance, Low-Overhead Microarchitecture for Secure Program Execution
    Kanuparthi, Arun K.
    Karri, Ramesh
    Ormazabal, Gaston
    Addepalli, Sateesh K.
    2012 IEEE 30TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2012, : 102 - 107
  • [35] Low-complexity V-BLAST detection scheme with high performance
    Guo M.-X.
    Jia C.
    Shen Y.-H.
    Gao Y.-Y.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2010, 37 (03): : 570 - 575
  • [36] A Low-Complexity and High-Performance 2D Look-Up Table for LDPC Hardware Implementation
    Chen, Jung-Chieh
    Yang, Po-Hui
    Lain, Jenn-Kaie
    Chung, Tzu-Wen
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2009, E92A (11) : 2941 - 2944
  • [37] YOLOv8n-RSDD: A High-Performance Low-Complexity Rail Surface Defect Detection Network
    Fang, Zhanao
    Li, Liming
    Peng, Lele
    Zheng, Shubin
    Zhong, Qianwen
    Zhu, Ting
    IEEE ACCESS, 2024, 12 : 196249 - 196265
  • [38] Low-Complexity High-Performance Low-Density Parity-Check Encoder Design for China Digital Radio Standard
    Chen, Dongying
    Chen, Pingping
    Fang, Yi
    IEEE ACCESS, 2017, 5 : 20880 - 20886
  • [39] LOW-COMPLEXITY, HIGH-PERFORMANCE AND BANDWIDTH EFFICIENT CONCATENATED CODED 8-PSK SCHEMES FOR RELIABLE DATA COMMUNICATIONS
    RAJPAL, S
    RHEE, DJ
    LIN, S
    IEEE TRANSACTIONS ON COMMUNICATIONS, 1995, 43 (2-4) : 785 - 794
  • [40] High-Performance Beamformer and Low-Complexity Detector for DF-Based Full-Duplex MIMO Relaying Networks
    Feng Shu
    Ye Zhou
    Riqing Chen
    Jin Wang
    Jun Li
    Branka Vucetic
    中国通信, 2017, 14 (02) : 173 - 182