A Scalable Architecture for Ordered Parallelism

被引：44

作者：

Jeffrey, Mark C. ^{[1
]}

Subramanian, Suvinay ^{[1
]}

Yan, Cong ^{[1
]}

Emer, Joel ^{[2
]}

Sanchez, Daniel ^{[1
]}

机构：

[1] MIT, CSAIL, Cambridge, MA 02139 USA

[2] MIT, CSAIL, NVIDIA, Cambridge, MA 02139 USA

来源：

PROCEEDINGS OF THE 48TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO-48) | 2015年

基金：

加拿大自然科学与工程研究理事会; 美国国家科学基金会;

关键词：

Multicore; ordered parallelism; irregular parallelism; fine-grain parallelism; synchronization; speculative execution;

D O I：

10.1145/2830772.2830777

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present Swarm, a novel architecture that exploits ordered irregular parallelism, which is abundant but hard to mine with current software and hardware techniques. In this architecture, programs consist of short tasks with programmer-specified timestamps. Swarm executes tasks speculatively and out of order, and efficiently speculates thousands of tasks ahead of the earliest active task to uncover ordered parallelism. Swarm builds on prior TLS and HTM schemes, and contributes several new techniques that allow it to scale to large core counts and speculation windows, including a new execution model, speculation-aware hardware task management, selective aborts, and scalable ordered commits. We evaluate Swarm on graph analytics, simulation, and database benchmarks. At 64 cores, Swarm achieves 51-122x speedups over a single-core system, and out-performs software-only parallel algorithms by 3-18x.

引用

页码：228 / 241

页数：14

共 50 条

[21] Parallelism and the ARM instruction set architecture
Goodacre, J
Sloss, AN
COMPUTER, 2005, 38 (07) : 42 - +
[22] Realizable architecture for genetic algorithms parallelism
Tang, KS
Ho, YC
Man, KF
ALGORITHMS AND ARCHITECTURES FOR REAL-TIME CONTROL 1997, 1997, : 233 - 238
[23] Compiler Techniques for Massively Scalable Implicit Task Parallelism
Armstrong, Timothy G.
Wozniak, Justin M.
Wilde, Michael
Foster, Ian T.
SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 299 - 310
[24] Scalable and Precise Dynamic Datarace Detection for Structured Parallelism
Raman, Raghavan
Zhao, Jisheng
Sarkar, Vivek
Vechev, Martin
Yahav, Eran
ACM SIGPLAN NOTICES, 2012, 47 (06) : 531 - 542
[25] Scalable video encoding with macroblock-level parallelism
Sankaraiah, Sreeramula
Shuan, Lam Hai
Eswaran, Chikkannan
Abdullah, Junaidi
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2014, : 1 - 15
[26] GLOBAL OPTIMIZATIONS FOR PARALLELISM AND LOCALITY ON SCALABLE PARALLEL MACHINES
ANDERSON, JM
LAM, MS
SIGPLAN NOTICES, 1993, 28 (06): : 112 - 125
[27] Toward compiler support for scalable parallelism using multipartitioning
Chavarría-Miranda, D
Mellor-Crummey, J
LANGUAGES, COMPILERS, AND RUN-TIME SYSTEMS FOR SCALABLE COMPUTERS, 2000, 1915 : 272 - 284
[28] Scalable video encoding with macroblock-level parallelism
Sreeramula Sankaraiah
Lam Hai Shuan
Chikkannan Eswaran
Junaidi Abdullah
EURASIP Journal on Advances in Signal Processing, 2014
[29] TESTING PARALLELISM OF REGRESSION LINES AGAINST ORDERED ALTERNATIVES
ADICHIE, JN
COMMUNICATIONS IN STATISTICS PART A-THEORY AND METHODS, 1976, 5 (11): : 985 - 997
[30] A scalable resampling architecture
Petrov, Mihail
Glesner, Manfred
GLOBECOM 2007: 2007 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE, VOLS 1-11, 2007, : 3102 - +

← 1 2 3 4 5 →