Compiler and hardware support for reducing the synchronization of speculative threads

被引:13
|
作者
Zhai, Antonia [1 ]
Steffan, J. Gregory [2 ]
Colohan, Christopher B. [3 ]
Mowry, Todd C. [4 ]
机构
[1] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
[2] Univ Toronto, Toronto, ON, Canada
[3] Google, Ann Arbor, MI 48104 USA
[4] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
design; experimentation; performance; thread-level speculation; chip-multiprocessing; automatic parallelization; instruction scheduling;
D O I
10.1145/1369396.1369399
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Thread-level speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. In this article, we focus on one important limitation of program performance under TLS, which stalls as a result of synchronizing and forwarding scalar values between speculative threads that would otherwise cause frequent data dependences and, hence, failed speculation. Using SPECint benchmarks that have been automatically transformed by our compiler to exploit TLS, we present, evaluate in detail, and compare both compiler and hardware techniques for improving the communication of scalar values. We find that through our dataflow algorithms for three increasingly aggressive instruction scheduling techniques, the compiler can drastically reduce the critical forwarding path introduced by the synchronization and forwarding of scalar values. We also show that hardware techniques for reducing synchronization can be complementary to compiler scheduling, but that the additional performance benefits are minimal and are generally not worth the cost.
引用
收藏
页码:1 / 33
页数:33
相关论文
共 50 条
  • [1] Compiler support for low-cost synchronization among threads
    Newburn, CJ
    Shen, JP
    PARALLEL COMPUTING: FUNDAMENTALS, APPLICATIONS AND NEW DIRECTIONS, 1998, 12 : 485 - 494
  • [2] Compiler optimization of scalar value communication between speculative threads
    Zhai, A
    Colohan, CB
    Steffan, JG
    Mowry, TC
    ACM SIGPLAN NOTICES, 2002, 37 (10) : 171 - 183
  • [3] Speculative synchronization and thread management for fine granularity threads
    Gontmakher, Alex
    Mendelson, Avi
    Schuster, Assaf
    Shklover, Gregory
    TWELFTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2006, : 283 - +
  • [4] Compiler Support for Concurrency Synchronization
    Lin, Tzong-Yen
    Lee, Cheng-Yu
    Chen, Chia-Jung
    Chang, Rong-Guey
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PT I: ICA3PP 2011, 2011, 7916 : 93 - 105
  • [5] CMP support for large and dependent speculative threads
    Colohan, Christopher B.
    Ailamaki, Anastasia
    Steffan, J. Gregory
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2007, 18 (08) : 1041 - 1054
  • [6] Efficient execution of speculative threads and transactions with hardware transactional memory
    Li, Gongming
    An, Hong
    Li, Qi
    Deng, Bobin
    Dai, Wenbo
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2014, 30 : 242 - 253
  • [7] SeTM: Efficient Execution of Speculative Threads with Hardware Transactional Memory
    Li, Gongming
    An, Hong
    Li, Qi
    Deng, Bobin
    Dai, Wenbo
    PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012), 2012, : 522 - 531
  • [8] Compiler optimization of memory-resident value communication between speculative threads
    Zhai, A
    Colohan, CB
    Steffan, JG
    Mowry, TC
    CGO 2004: INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 2004, : 39 - 50
  • [9] Compiler support for dynamic speculative pre-execution
    Ro, WW
    Gaudiot, JL
    INTERACT-7 2003: SEVENTH WORKSHOP ON INTERACTION BETWEEN COMPILERS AND COMPUTER ARCHITECTURES, PROCEEDINGS, 2003, : 14 - 23
  • [10] A non-blocking multithreaded architecture with support for speculative threads
    Kavi, Krishna
    Li, Wentong
    Hurson, Ali
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PROCEEDINGS, 2008, 5022 : 173 - +