The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

被引:0
|
作者
Lakshminarasimhan, Kartik [1 ]
Naithani, Ajeya [1 ]
Feliu, Josue [2 ]
Eeckhout, Lieven [1 ]
机构
[1] Univ Ghent, Technol Pk 126, B-9052 Ghent, Belgium
[2] Univ Murcia, C Campus Univ,Edificio 32, Murcia 30100, Spain
基金
欧洲研究理事会;
关键词
Superscalar microarchitecture; slice-out-of-order; dynamic instruction scheduling;
D O I
10.1145/3499424
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally provide high performance in a power- and cost-efficient manner. Recently proposed slice-out-of-order (sOoO) cores identify backward slices of memory operations which they execute out-of-order with respect to the rest of the dynamic instruction stream for increased instruction-level and memory-hierarchy parallelism. Unfortunately, constructing backward slices is imprecise and hardware-inefficient, leaving performance on the table. In this article, we propose Forward Slice Core (FSC), a novel core microarchitecture that builds on a stall-on-use in-order core and extracts more instruction-level and memory-hierarchy parallelism than slice-out-of-order cores. FSC does so by identifying and steering forward slices (rather than backward slices) to dedicated in-order FIFO queues. Moreover, FSC puts load-consumers that depend on L1 D-cache misses on the side to enable younger independent load-consumers to execute faster. Finally, FSC eliminates the need for dynamic memory disambiguation by replicating store-address instructions across queues. Considering 3-wide pipeline configurations, we find that FSC improves performance by 27.1%, 21.1%, and 14.6% on average compared to Freeway, the state-of-the-art sOoO core, across SPEC CPU2017, GAP, and DaCapo, respectively, while at the same time incurring reduced hardware complexity. Compared to an OoO core, FSC reduces power consumption by 61.3% and chip area by 47%, providing a microarchitecture with high performance at low complexity.
引用
收藏
页数:25
相关论文
共 50 条
  • [41] High-Performance Beamformer and Low-Complexity Detector for DF-Based Full-Duplex MIMO Relaying Networks
    Shu, Feng
    Zhou, Ye
    Chen, Riqing
    Wang, Jin
    Li, Jun
    Vucetic, Branka
    CHINA COMMUNICATIONS, 2017, 14 (02) : 173 - 182
  • [42] LOW-COMPLEXITY AND HIGH-PERFORMANCE NON-COHERENT CELL IDENTIFICATION DETECTION SCHEMES FOR OFDM-BASED SYSTEMS
    Lin, Ying-Tsung
    Wang, Yi-Hsiang
    Chen, Sau-Gee
    Chen, Chih-Liang
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 4918 - 4922
  • [43] Quantized Compute and Forward: A Low-Complexity Architecture for Distributed Antenna Systems
    Song-Nam Hong
    Caire, Giuseppe
    2011 IEEE INFORMATION THEORY WORKSHOP (ITW), 2011,
  • [44] Towards a Low-Complexity Dynamic Decode-and-Forward Relaying Protocol
    Sadeghi, Parastoo
    2010 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2010,
  • [45] A Low-Complexity Message Recovery Method for Compute-and-Forward Relaying
    Barreal, Amaro
    Paakkonen, Joonas
    Karpuk, David
    Hollanti, Camilla
    Tirkkonen, Olav
    2015 IEEE INFORMATION THEORY WORKSHOP - FALL (ITW), 2015, : 39 - 43
  • [46] Low-Complexity Compute-and-Forward Techniques for Multisource Multirelay Networks
    Molu, Mehdi M.
    Cumanan, Kanapathippillai
    Burr, Alister
    IEEE COMMUNICATIONS LETTERS, 2016, 20 (05) : 926 - 929
  • [47] High-performance, low complexity yelp siren detection system
    Dobre, Robert-Alexandru
    Dumitrascu, Elena-Valentina
    ALEXANDRIA ENGINEERING JOURNAL, 2024, 109 : 669 - 684
  • [48] Performance of low-complexity MMSE beamforming for WLAN systems
    Tsai, JA
    Chuang, WP
    Ting, PA
    Jian, YY
    Hsiao, CL
    VTC2004-FALL: 2004 IEEE 60TH VEHICULAR TECHNOLOGY CONFERENCE, VOLS 1-7: WIRELESS TECHNOLOGIES FOR GLOBAL SECURITY, 2004, : 3631 - 3634
  • [49] LOW-COMPLEXITY AND HIGH-PERFORMANCE SOFT MIMO DETECTION BASED ON DISTRIBUTED M-ALGORITHM THROUGH TRELLIS-DIAGRAM
    Sun, Yang
    Cavallaro, Joseph R.
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 3398 - 3401
  • [50] Low-complexity performance optimization for MIMO CDMA systems
    Liu, CH
    2005 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, VOLS 1-4: WCNC 2005: BROADBAND WIRELESS FOR THE MASSES READY FOR TAKE-OFF., 2005, : 280 - 285