Safe Compiler-driven Transaction Checkpointing and Recovery

被引:0
|
作者
Sreeram, Jaswanth [1 ]
Pande, Santosh [2 ]
机构
[1] Intel Labs, Santa Clara, CA USA
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
关键词
software transactional memory; checkpointing; continuations;
D O I
10.1145/2398857.2384620
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Several studies have shown that a large fraction of the work performed inside memory transactions in representative programs is wasted due to the transaction experiencing a conflict and aborting. Aborts inside long running transactions are especially influential to performance and the simplicity of the TM programming model (relative to using finegrained locking) in synchronizing large critical sections means that large transactions are common and this exacerbates the problem of wasted work. In this paper we present a practical transaction checkpoint and recovery scheme in which transactions that experience a conflict can restore their state (including the local context in which they were executing) to some dynamic program point before this access and begin execution from that point. This state saving and restoration is implemented by checkpoint operations that are generated by a compiler into the transactions body and are also optimized to reduce the amount of state that is saved and restored. We also describe a runtime system that manages these checkpointed states and orchestrates the restoration of the right checkpointed state for a conflict on a particular transactional access. Moreover the synthesis of these save & restore operations, their optimization and invocation at runtime are completely transparent to the programmer. We have implemented the checkpoint generation and optimization scheme in the LLVM compiler and runtime support for the TL2 STM system. Our experiments indicate that for many parallel programs using such checkpoint recovery schemes can result in upto several orders of magnitude reduction in number of aborts and significant execution time speedups relative to plain transactional programs for the same number of threads.
引用
收藏
页码:41 / 55
页数:15
相关论文
共 50 条
  • [31] Checkpointing OpenSHMEM Programs Using Compiler Analysis
    Bari, Md Abdullah Shahneous
    Basu, Debasmita
    Lu, Wenbin
    Curtis, Tony
    Chapman, Barbara
    PROCEEDINGS OF 2020 IEEE/ACM 10TH WORKSHOP ON FAULT TOLERANCE FOR HPC AT EXTREME SCALE (FTXS 2020), 2020, : 51 - 60
  • [32] Compiler-Enhanced Incremental Checkpointing for OpenMP Applications
    Bronevetsky, Greg
    Marques, Daniel
    Pingali, Keshav
    McKee, Sally
    Rugina, Radu
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 189 - +
  • [33] Analysis of a transaction system with checkpointing, failures, and rollback
    Kumar, L
    Misra, M
    Mitrani, I
    COMPUTER PERFORMANCE EVALUATION: MODELLING TECHNIQUES AND TOOLS, 2002, 2324 : 279 - 288
  • [34] How safe is probabilistic checkpointing?
    Elnozahy, EN
    TWENTY-EIGHTH ANNUAL INTERNATIONAL SYMPOSIUM ON FAULT-TOLERANT COMPUTING, DIGEST PAPERS, 1998, : 358 - 363
  • [35] Compiler-Enhanced Incremental Checkpointing for OpenMP Applications
    Bronevetsky, Greg
    Marques, Daniel
    Pingali, Keshav
    Rugina, Radu
    McKee, Sally A.
    PPOPP'08: PROCEEDINGS OF THE 2008 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2008, : 275 - 276
  • [36] CHECKPOINTING AND RECOVERY IN A PIPELINE OF TRANSPUTERS
    SINHA, A
    DAS, PK
    CHAUDHURI, A
    MICROPROCESSING AND MICROPROGRAMMING, 1992, 35 (1-5): : 141 - 147
  • [37] ACR: Amnesic Checkpointing and Recovery
    Akturk, Ismail
    Karpuzcu, Ulya R.
    2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020), 2020, : 30 - 43
  • [38] Locks and barriers in checkpointing and recovery
    Badrinath, R
    Morin, C
    2004 IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID - CCGRID 2004, 2004, : 459 - 466
  • [39] Checkpointing and Recovery Mechanism in Grid
    Mehta, Janki
    Chaudhary, Sanjay
    ADCOM: 2008 16TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATIONS, 2008, : 131 - 140
  • [40] Recovery protocol for mobile checkpointing
    Higaki, H
    Takizawa, M
    NINTH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 1998, : 520 - 525