The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

被引：0

作者：

Lakshminarasimhan, Kartik ^{[1
]}

Naithani, Ajeya ^{[1
]}

Feliu, Josue ^{[2
]}

Eeckhout, Lieven ^{[1
]}

机构：

[1] Univ Ghent, Technol Pk 126, B-9052 Ghent, Belgium

[2] Univ Murcia, C Campus Univ,Edificio 32, Murcia 30100, Spain

来源：

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION | 2022年 / 19卷 / 02期

基金：

欧洲研究理事会;

关键词：

Superscalar microarchitecture; slice-out-of-order; dynamic instruction scheduling;

D O I：

10.1145/3499424

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally provide high performance in a power- and cost-efficient manner. Recently proposed slice-out-of-order (sOoO) cores identify backward slices of memory operations which they execute out-of-order with respect to the rest of the dynamic instruction stream for increased instruction-level and memory-hierarchy parallelism. Unfortunately, constructing backward slices is imprecise and hardware-inefficient, leaving performance on the table. In this article, we propose Forward Slice Core (FSC), a novel core microarchitecture that builds on a stall-on-use in-order core and extracts more instruction-level and memory-hierarchy parallelism than slice-out-of-order cores. FSC does so by identifying and steering forward slices (rather than backward slices) to dedicated in-order FIFO queues. Moreover, FSC puts load-consumers that depend on L1 D-cache misses on the side to enable younger independent load-consumers to execute faster. Finally, FSC eliminates the need for dynamic memory disambiguation by replicating store-address instructions across queues. Considering 3-wide pipeline configurations, we find that FSC improves performance by 27.1%, 21.1%, and 14.6% on average compared to Freeway, the state-of-the-art sOoO core, across SPEC CPU2017, GAP, and DaCapo, respectively, while at the same time incurring reduced hardware complexity. Compared to an OoO core, FSC reduces power consumption by 61.3% and chip area by 47%, providing a microarchitecture with high performance at low complexity.

引用

页数：25

共 50 条

[21] A low-complexity high-performance preprocessing algorithm for multiuser detection using Gold sequences
Axehill, Daniel
Gunnarsson, Fredrik
Hansson, Anders
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2008, 56 (09) : 4377 - 4385
[22] High-performance, low-complexity decoding of Generalized Low-Density Parity-Check codes
Zhang, T
Parhi, KK
GLOBECOM '01: IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE, VOLS 1-6, 2001, : 181 - 185
[23] LOW-COMPLEXITY DETECTION AND PERFORMANCE ANALYSIS FOR DECODE-AND-FORWARD RELAY NETWORKS
Lu, Yuxin
Mow, Wai Ho
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 4819 - 4823
[24] A high-performance and low-complexity video transcoding scheme for video streaming over wireless links
Cai, JF
Chen, CW
WCNC 2002: IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE RECORD, VOLS 1 & 2, 2002, : 913 - 917
[25] A Low-Complexity And High-Performance Hybrid Problem Solving Method Besed On Neighborhood Search Algorithms
Kung, Chih-ming
Chen, Guan-Zhou
Chao, Shu-Tsung
Yang, Wei-Sheng
Chuang, Li-Min
2011 INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND NEURAL COMPUTING (FSNC 2011), VOL I, 2011, : 282 - 285
[26] A Low-Complexity and High-Performance Energy Management Strategy of a Hybrid Electric Vehicle by Model Approximation
Liu, Tong
Zhu, Wenyao
Tan, Kaige
Liu, Mingwei
Feng, Lei
2022 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2022, : 455 - 462
[27] A Low-Complexity High-Performance Wear-Leveling Algorithm for Flash Memory System Design
Chung, Ching-Che
Hsueh, Ning-Mi
2012 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS), 2012, : 595 - 598
[28] Low-complexity link microarchitecture for mesochronous communication in Networks-on-Chip
Vitullo, Francesco
L'Insalata, Nicola E.
Petri, Esa
Saponara, Sergio
Fanucci, Luca
Casula, Michele
Locatelli, Riccardo
Coppola, Marcello
IEEE TRANSACTIONS ON COMPUTERS, 2008, 57 (09) : 1196 - 1201
[29] High-Performance Low-Complexity Hierarchical Frequency Synchronization for Distributed Massive MIMO-OFDMA Systems
Wang, Xiao-Yang
Yang, Shaoshi
Yuan, Tian-Hao
Zhai, Hou-Yu
Zhang, Jianhua
Hanzo, Lajos
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (09) : 12343 - 12348
[30] C3: High-performance and low-complexity neural compression from a single image or video
Kim, Hyunjik
Bauer, Matthias
Theis, Lucas
Schwarz, Jonathan Richard
Dupont, Emilien
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 9347 - 9358

← 1 2 3 4 5 →