The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

被引：0

作者：

Lakshminarasimhan, Kartik ^{[1
]}

Naithani, Ajeya ^{[1
]}

Feliu, Josue ^{[2
]}

Eeckhout, Lieven ^{[1
]}

机构：

[1] Univ Ghent, Technol Pk 126, B-9052 Ghent, Belgium

[2] Univ Murcia, C Campus Univ,Edificio 32, Murcia 30100, Spain

来源：

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION | 2022年 / 19卷 / 02期

基金：

欧洲研究理事会;

关键词：

Superscalar microarchitecture; slice-out-of-order; dynamic instruction scheduling;

D O I：

10.1145/3499424

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally provide high performance in a power- and cost-efficient manner. Recently proposed slice-out-of-order (sOoO) cores identify backward slices of memory operations which they execute out-of-order with respect to the rest of the dynamic instruction stream for increased instruction-level and memory-hierarchy parallelism. Unfortunately, constructing backward slices is imprecise and hardware-inefficient, leaving performance on the table. In this article, we propose Forward Slice Core (FSC), a novel core microarchitecture that builds on a stall-on-use in-order core and extracts more instruction-level and memory-hierarchy parallelism than slice-out-of-order cores. FSC does so by identifying and steering forward slices (rather than backward slices) to dedicated in-order FIFO queues. Moreover, FSC puts load-consumers that depend on L1 D-cache misses on the side to enable younger independent load-consumers to execute faster. Finally, FSC eliminates the need for dynamic memory disambiguation by replicating store-address instructions across queues. Considering 3-wide pipeline configurations, we find that FSC improves performance by 27.1%, 21.1%, and 14.6% on average compared to Freeway, the state-of-the-art sOoO core, across SPEC CPU2017, GAP, and DaCapo, respectively, while at the same time incurring reduced hardware complexity. Compared to an OoO core, FSC reduces power consumption by 61.3% and chip area by 47%, providing a microarchitecture with high performance at low complexity.

引用

页数：25

共 50 条

[1] The Forward Slice Core Microarchitecture
Lakshminarasimhan, Kartik
Naithani, Ajeya
Feliu, Josue
Eeckhout, Lieven
PACT '20: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2020, : 361 - 372
[2] High-performance low-complexity wordspotting using neural networks
Chang, EI
Lippmann, RP
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1997, 45 (11) : 2864 - 2870
[3] A low-complexity high-performance noncoherent receiver for GFSK signals
He, Jinjin
Cui, Jian
Yang, Lianxing
Wang, Zhongfeng
PROCEEDINGS OF 2008 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-10, 2008, : 1256 - +
[4] Concatenated tree codes: A low-complexity, high-performance approach
Li, P
Wu, KY
IEEE TRANSACTIONS ON INFORMATION THEORY, 2001, 47 (02) : 791 - 799
[5] New Constructions of High-Performance Low-Complexity Convolutional Codes
Katsiotis, Alexandros
Rizomiliotis, Panagiotis
Kalouptsidis, Nicholas
IEEE TRANSACTIONS ON COMMUNICATIONS, 2010, 58 (07) : 1961 - 1972
[6] Low-Complexity and High-Performance Combiners for Over the Air Computing
Ando, Kengo
de Abreu, Giuseppe Thadeu Freitas
2023 IEEE 9TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING, CAMSAP, 2023, : 126 - 130
[7] A low-complexity, high-performance fetch unit for simultaneous multithreading processors
Falcón, A
Ramirez, A
Valero, M
10TH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2004, : 244 - 253
[8] Low-Complexity High-Performance Cyclic Caching for Large MISO Systems
Salehi, MohammadJavad
Parrinello, Emanuele
Shariatpanahi, Seyed Pooya
Elia, Petros
Tolli, Antti
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2022, 21 (05) : 3263 - 3278
[9] Low-Complexity High-Performance Method for Calculating Arbitrary Logarithm Function
Zhang, Yongzhen
Zhang, Yuan
Zhang, Yonggang
Chen, Hui
2022 19TH INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2022, : 63 - 64
[10] A low-complexity high-performance modulation code for holographic data storage
Chen, Chi-Yun
Chiueh, Tzi-Dar
2007 14TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS 1-4, 2007, : 788 - 791

← 1 2 3 4 5 →