The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

被引:0
|
作者
Lakshminarasimhan, Kartik [1 ]
Naithani, Ajeya [1 ]
Feliu, Josue [2 ]
Eeckhout, Lieven [1 ]
机构
[1] Univ Ghent, Technol Pk 126, B-9052 Ghent, Belgium
[2] Univ Murcia, C Campus Univ,Edificio 32, Murcia 30100, Spain
基金
欧洲研究理事会;
关键词
Superscalar microarchitecture; slice-out-of-order; dynamic instruction scheduling;
D O I
10.1145/3499424
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally provide high performance in a power- and cost-efficient manner. Recently proposed slice-out-of-order (sOoO) cores identify backward slices of memory operations which they execute out-of-order with respect to the rest of the dynamic instruction stream for increased instruction-level and memory-hierarchy parallelism. Unfortunately, constructing backward slices is imprecise and hardware-inefficient, leaving performance on the table. In this article, we propose Forward Slice Core (FSC), a novel core microarchitecture that builds on a stall-on-use in-order core and extracts more instruction-level and memory-hierarchy parallelism than slice-out-of-order cores. FSC does so by identifying and steering forward slices (rather than backward slices) to dedicated in-order FIFO queues. Moreover, FSC puts load-consumers that depend on L1 D-cache misses on the side to enable younger independent load-consumers to execute faster. Finally, FSC eliminates the need for dynamic memory disambiguation by replicating store-address instructions across queues. Considering 3-wide pipeline configurations, we find that FSC improves performance by 27.1%, 21.1%, and 14.6% on average compared to Freeway, the state-of-the-art sOoO core, across SPEC CPU2017, GAP, and DaCapo, respectively, while at the same time incurring reduced hardware complexity. Compared to an OoO core, FSC reduces power consumption by 61.3% and chip area by 47%, providing a microarchitecture with high performance at low complexity.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] A low-complexity high-performance preprocessing algorithm for multiuser detection using Gold sequences
    Axehill, Daniel
    Gunnarsson, Fredrik
    Hansson, Anders
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2008, 56 (09) : 4377 - 4385
  • [22] High-performance, low-complexity decoding of Generalized Low-Density Parity-Check codes
    Zhang, T
    Parhi, KK
    GLOBECOM '01: IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE, VOLS 1-6, 2001, : 181 - 185
  • [23] LOW-COMPLEXITY DETECTION AND PERFORMANCE ANALYSIS FOR DECODE-AND-FORWARD RELAY NETWORKS
    Lu, Yuxin
    Mow, Wai Ho
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 4819 - 4823
  • [24] A high-performance and low-complexity video transcoding scheme for video streaming over wireless links
    Cai, JF
    Chen, CW
    WCNC 2002: IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE RECORD, VOLS 1 & 2, 2002, : 913 - 917
  • [25] A Low-Complexity And High-Performance Hybrid Problem Solving Method Besed On Neighborhood Search Algorithms
    Kung, Chih-ming
    Chen, Guan-Zhou
    Chao, Shu-Tsung
    Yang, Wei-Sheng
    Chuang, Li-Min
    2011 INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND NEURAL COMPUTING (FSNC 2011), VOL I, 2011, : 282 - 285
  • [26] A Low-Complexity and High-Performance Energy Management Strategy of a Hybrid Electric Vehicle by Model Approximation
    Liu, Tong
    Zhu, Wenyao
    Tan, Kaige
    Liu, Mingwei
    Feng, Lei
    2022 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2022, : 455 - 462
  • [27] A Low-Complexity High-Performance Wear-Leveling Algorithm for Flash Memory System Design
    Chung, Ching-Che
    Hsueh, Ning-Mi
    2012 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS), 2012, : 595 - 598
  • [28] Low-complexity link microarchitecture for mesochronous communication in Networks-on-Chip
    Vitullo, Francesco
    L'Insalata, Nicola E.
    Petri, Esa
    Saponara, Sergio
    Fanucci, Luca
    Casula, Michele
    Locatelli, Riccardo
    Coppola, Marcello
    IEEE TRANSACTIONS ON COMPUTERS, 2008, 57 (09) : 1196 - 1201
  • [29] High-Performance Low-Complexity Hierarchical Frequency Synchronization for Distributed Massive MIMO-OFDMA Systems
    Wang, Xiao-Yang
    Yang, Shaoshi
    Yuan, Tian-Hao
    Zhai, Hou-Yu
    Zhang, Jianhua
    Hanzo, Lajos
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (09) : 12343 - 12348
  • [30] C3: High-performance and low-complexity neural compression from a single image or video
    Kim, Hyunjik
    Bauer, Matthias
    Theis, Lucas
    Schwarz, Jonathan Richard
    Dupont, Emilien
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 9347 - 9358