A Scalable Architecture for Ordered Parallelism

被引:44
|
作者
Jeffrey, Mark C. [1 ]
Subramanian, Suvinay [1 ]
Yan, Cong [1 ]
Emer, Joel [2 ]
Sanchez, Daniel [1 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
[2] MIT, CSAIL, NVIDIA, Cambridge, MA 02139 USA
来源
PROCEEDINGS OF THE 48TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO-48) | 2015年
基金
加拿大自然科学与工程研究理事会; 美国国家科学基金会;
关键词
Multicore; ordered parallelism; irregular parallelism; fine-grain parallelism; synchronization; speculative execution;
D O I
10.1145/2830772.2830777
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present Swarm, a novel architecture that exploits ordered irregular parallelism, which is abundant but hard to mine with current software and hardware techniques. In this architecture, programs consist of short tasks with programmer-specified timestamps. Swarm executes tasks speculatively and out of order, and efficiently speculates thousands of tasks ahead of the earliest active task to uncover ordered parallelism. Swarm builds on prior TLS and HTM schemes, and contributes several new techniques that allow it to scale to large core counts and speculation windows, including a new execution model, speculation-aware hardware task management, selective aborts, and scalable ordered commits. We evaluate Swarm on graph analytics, simulation, and database benchmarks. At 64 cores, Swarm achieves 51-122x speedups over a single-core system, and out-performs software-only parallel algorithms by 3-18x.
引用
收藏
页码:228 / 241
页数:14
相关论文
共 50 条
  • [31] A scalable services architecture
    Marian, Tudor
    Birman, Ken
    van Renesse, Robbert
    SRDS 2006: 25TH IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 2006, : 289 - 300
  • [32] An Efficient and Parallelism-Scalable Large Integer Multiplier Architecture Using Least-Positive Form and Winograd Fast Algorithm
    Wang, Jianfei
    Hou, Jia
    Zhang, Fahong
    Meng, Yishuo
    Su, Yang
    Yang, Chen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2025, 72 (01) : 248 - 252
  • [33] Exploiting Locality in Scalable Ordered Maps
    Rodriguez, Matthew
    Hassan, Ahmed
    Spear, Michael
    PACT '20: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2020, : 351 - 352
  • [34] Exploiting Locality in Scalable Ordered Maps
    Rodriguez, Matthew
    Hassan, Ahmed
    Spear, Michael
    2021 IEEE 41ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2021), 2021, : 998 - 1008
  • [35] A Scalable Task Parallelism Approach For LU Decomposition With Multicore CPUs
    Rana, Verinder S.
    Lin, Meifeng
    Chapman, Barbara
    PROCEEDINGS OF SECOND INTERNATIONAL WORKSHOP ON EXTREME SCALE PROGRAMMING MODELS AND MIDDLEWARE (ESPM2 2016), 2016, : 17 - 23
  • [36] A combinatorial architecture for instruction-level parallelism
    Berkovich, E
    Berkovich, S
    MICROPROCESSORS AND MICROSYSTEMS, 1998, 22 (01) : 23 - 31
  • [37] A superscalar architecture to exploit instruction level parallelism
    Steven, G
    Christianson, B
    Collins, R
    Potter, R
    Steven, F
    MICROPROCESSORS AND MICROSYSTEMS, 1997, 20 (07) : 391 - 400
  • [38] Instruction-level parallelism and processor architecture
    Ebcioglu, K
    EURO-PAR 2000 PARALLEL PROCESSING, PROCEEDINGS, 2000, 1900 : 939 - 939
  • [39] Scalable Irregular Parallelism with GPUs: Getting CPUs Out of the Way
    Chen, Yuxin
    Brock, Benjamin
    Porumbescu, Serban
    Buluc, Aydin
    Yelick, Katherine
    Owens, John D.
    SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
  • [40] CAPSlog: Scalable Memory-Centric Partitioning for Pipeline Parallelism
    Dreuning, Henk
    Liokouras, Anna Badia
    Ouyang, Xiaowei
    Bal, Henri E.
    van Nieuwpoort, Rob V.
    2024 32ND EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PDP 2024, 2024, : 17 - 25