A Scalable Architecture for Ordered Parallelism

被引：44

作者：

Jeffrey, Mark C. ^{[1
]}

Subramanian, Suvinay ^{[1
]}

Yan, Cong ^{[1
]}

Emer, Joel ^{[2
]}

Sanchez, Daniel ^{[1
]}

机构：

[1] MIT, CSAIL, Cambridge, MA 02139 USA

[2] MIT, CSAIL, NVIDIA, Cambridge, MA 02139 USA

来源：

PROCEEDINGS OF THE 48TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO-48) | 2015年

基金：

加拿大自然科学与工程研究理事会; 美国国家科学基金会;

关键词：

Multicore; ordered parallelism; irregular parallelism; fine-grain parallelism; synchronization; speculative execution;

D O I：

10.1145/2830772.2830777

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present Swarm, a novel architecture that exploits ordered irregular parallelism, which is abundant but hard to mine with current software and hardware techniques. In this architecture, programs consist of short tasks with programmer-specified timestamps. Swarm executes tasks speculatively and out of order, and efficiently speculates thousands of tasks ahead of the earliest active task to uncover ordered parallelism. Swarm builds on prior TLS and HTM schemes, and contributes several new techniques that allow it to scale to large core counts and speculation windows, including a new execution model, speculation-aware hardware task management, selective aborts, and scalable ordered commits. We evaluate Swarm on graph analytics, simulation, and database benchmarks. At 64 cores, Swarm achieves 51-122x speedups over a single-core system, and out-performs software-only parallel algorithms by 3-18x.

引用

页码：228 / 241

页数：14

共 50 条

[31] A scalable services architecture
Marian, Tudor
Birman, Ken
van Renesse, Robbert
SRDS 2006: 25TH IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 2006, : 289 - 300
[32] An Efficient and Parallelism-Scalable Large Integer Multiplier Architecture Using Least-Positive Form and Winograd Fast Algorithm
Wang, Jianfei
Hou, Jia
Zhang, Fahong
Meng, Yishuo
Su, Yang
Yang, Chen
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2025, 72 (01) : 248 - 252
[33] Exploiting Locality in Scalable Ordered Maps
Rodriguez, Matthew
Hassan, Ahmed
Spear, Michael
PACT '20: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2020, : 351 - 352
[34] Exploiting Locality in Scalable Ordered Maps
Rodriguez, Matthew
Hassan, Ahmed
Spear, Michael
2021 IEEE 41ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2021), 2021, : 998 - 1008
[35] A Scalable Task Parallelism Approach For LU Decomposition With Multicore CPUs
Rana, Verinder S.
Lin, Meifeng
Chapman, Barbara
PROCEEDINGS OF SECOND INTERNATIONAL WORKSHOP ON EXTREME SCALE PROGRAMMING MODELS AND MIDDLEWARE (ESPM2 2016), 2016, : 17 - 23
[36] A combinatorial architecture for instruction-level parallelism
Berkovich, E
Berkovich, S
MICROPROCESSORS AND MICROSYSTEMS, 1998, 22 (01) : 23 - 31
[37] A superscalar architecture to exploit instruction level parallelism
Steven, G
Christianson, B
Collins, R
Potter, R
Steven, F
MICROPROCESSORS AND MICROSYSTEMS, 1997, 20 (07) : 391 - 400
[38] Instruction-level parallelism and processor architecture
Ebcioglu, K
EURO-PAR 2000 PARALLEL PROCESSING, PROCEEDINGS, 2000, 1900 : 939 - 939
[39] Scalable Irregular Parallelism with GPUs: Getting CPUs Out of the Way
Chen, Yuxin
Brock, Benjamin
Porumbescu, Serban
Buluc, Aydin
Yelick, Katherine
Owens, John D.
SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
[40] CAPSlog: Scalable Memory-Centric Partitioning for Pipeline Parallelism
Dreuning, Henk
Liokouras, Anna Badia
Ouyang, Xiaowei
Bal, Henri E.
van Nieuwpoort, Rob V.
2024 32ND EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PDP 2024, 2024, : 17 - 25

← 1 2 3 4 5 →