A Scalable Architecture for Ordered Parallelism

被引:44
|
作者
Jeffrey, Mark C. [1 ]
Subramanian, Suvinay [1 ]
Yan, Cong [1 ]
Emer, Joel [2 ]
Sanchez, Daniel [1 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
[2] MIT, CSAIL, NVIDIA, Cambridge, MA 02139 USA
来源
PROCEEDINGS OF THE 48TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO-48) | 2015年
基金
加拿大自然科学与工程研究理事会; 美国国家科学基金会;
关键词
Multicore; ordered parallelism; irregular parallelism; fine-grain parallelism; synchronization; speculative execution;
D O I
10.1145/2830772.2830777
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present Swarm, a novel architecture that exploits ordered irregular parallelism, which is abundant but hard to mine with current software and hardware techniques. In this architecture, programs consist of short tasks with programmer-specified timestamps. Swarm executes tasks speculatively and out of order, and efficiently speculates thousands of tasks ahead of the earliest active task to uncover ordered parallelism. Swarm builds on prior TLS and HTM schemes, and contributes several new techniques that allow it to scale to large core counts and speculation windows, including a new execution model, speculation-aware hardware task management, selective aborts, and scalable ordered commits. We evaluate Swarm on graph analytics, simulation, and database benchmarks. At 64 cores, Swarm achieves 51-122x speedups over a single-core system, and out-performs software-only parallel algorithms by 3-18x.
引用
收藏
页码:228 / 241
页数:14
相关论文
共 50 条
  • [21] Parallelism and the ARM instruction set architecture
    Goodacre, J
    Sloss, AN
    COMPUTER, 2005, 38 (07) : 42 - +
  • [22] Realizable architecture for genetic algorithms parallelism
    Tang, KS
    Ho, YC
    Man, KF
    ALGORITHMS AND ARCHITECTURES FOR REAL-TIME CONTROL 1997, 1997, : 233 - 238
  • [23] Compiler Techniques for Massively Scalable Implicit Task Parallelism
    Armstrong, Timothy G.
    Wozniak, Justin M.
    Wilde, Michael
    Foster, Ian T.
    SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 299 - 310
  • [24] Scalable and Precise Dynamic Datarace Detection for Structured Parallelism
    Raman, Raghavan
    Zhao, Jisheng
    Sarkar, Vivek
    Vechev, Martin
    Yahav, Eran
    ACM SIGPLAN NOTICES, 2012, 47 (06) : 531 - 542
  • [25] Scalable video encoding with macroblock-level parallelism
    Sankaraiah, Sreeramula
    Shuan, Lam Hai
    Eswaran, Chikkannan
    Abdullah, Junaidi
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2014, : 1 - 15
  • [26] GLOBAL OPTIMIZATIONS FOR PARALLELISM AND LOCALITY ON SCALABLE PARALLEL MACHINES
    ANDERSON, JM
    LAM, MS
    SIGPLAN NOTICES, 1993, 28 (06): : 112 - 125
  • [27] Toward compiler support for scalable parallelism using multipartitioning
    Chavarría-Miranda, D
    Mellor-Crummey, J
    LANGUAGES, COMPILERS, AND RUN-TIME SYSTEMS FOR SCALABLE COMPUTERS, 2000, 1915 : 272 - 284
  • [28] Scalable video encoding with macroblock-level parallelism
    Sreeramula Sankaraiah
    Lam Hai Shuan
    Chikkannan Eswaran
    Junaidi Abdullah
    EURASIP Journal on Advances in Signal Processing, 2014
  • [29] TESTING PARALLELISM OF REGRESSION LINES AGAINST ORDERED ALTERNATIVES
    ADICHIE, JN
    COMMUNICATIONS IN STATISTICS PART A-THEORY AND METHODS, 1976, 5 (11): : 985 - 997
  • [30] A scalable resampling architecture
    Petrov, Mihail
    Glesner, Manfred
    GLOBECOM 2007: 2007 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE, VOLS 1-11, 2007, : 3102 - +