Safe Compiler-driven Transaction Checkpointing and Recovery

被引:0
|
作者
Sreeram, Jaswanth [1 ]
Pande, Santosh [2 ]
机构
[1] Intel Labs, Santa Clara, CA USA
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
关键词
software transactional memory; checkpointing; continuations;
D O I
10.1145/2398857.2384620
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Several studies have shown that a large fraction of the work performed inside memory transactions in representative programs is wasted due to the transaction experiencing a conflict and aborting. Aborts inside long running transactions are especially influential to performance and the simplicity of the TM programming model (relative to using finegrained locking) in synchronizing large critical sections means that large transactions are common and this exacerbates the problem of wasted work. In this paper we present a practical transaction checkpoint and recovery scheme in which transactions that experience a conflict can restore their state (including the local context in which they were executing) to some dynamic program point before this access and begin execution from that point. This state saving and restoration is implemented by checkpoint operations that are generated by a compiler into the transactions body and are also optimized to reduce the amount of state that is saved and restored. We also describe a runtime system that manages these checkpointed states and orchestrates the restoration of the right checkpointed state for a conflict on a particular transactional access. Moreover the synthesis of these save & restore operations, their optimization and invocation at runtime are completely transparent to the programmer. We have implemented the checkpoint generation and optimization scheme in the LLVM compiler and runtime support for the TL2 STM system. Our experiments indicate that for many parallel programs using such checkpoint recovery schemes can result in upto several orders of magnitude reduction in number of aborts and significant execution time speedups relative to plain transactional programs for the same number of threads.
引用
收藏
页码:41 / 55
页数:15
相关论文
共 50 条
  • [41] Compiler-Assisted Checkpointing of Parallel Codes: The Cetus and LLVM Experience
    Rodriguez, Gabriel
    Martin, Maria J.
    Gonzalez, Patricia
    Tourino, Juan
    Doallo, Ramon
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2013, 41 (06) : 782 - 805
  • [42] ON THE OPTIMAL CHECKPOINTING OF CRITICAL TASKS AND TRANSACTION-ORIENTED SYSTEMS
    GRASSI, V
    DONATIELLO, L
    TUCCI, S
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1992, 18 (01) : 72 - 77
  • [43] Compiler Support for Fine-Grain Software-Only Checkpointing
    Zhao, Chuck
    Steffan, J. Gregory
    Amza, Cristiana
    Kielstra, Allan
    COMPILER CONSTRUCTION, CC 2012, 2012, 7210 : 200 - 219
  • [44] Compiler-Directed Incremental Checkpointing for Low Latency GPU Preemption
    Ji, Zhuoran
    Wang, Cho-Li
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), 2022, : 751 - 761
  • [45] Compiler-Assisted Application-Level Checkpointing for MPI Programs
    Yang, Xuejun
    Wang, Panfeng
    Fu, Hongyi
    Du, Yunfei
    Wang, Zhiyuan
    Jia, Jia
    28TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, VOLS 1 AND 2, PROCEEDINGS, 2008, : 251 - 259
  • [46] Compiler-Assisted Checkpointing of Parallel Codes: The Cetus and LLVM Experience
    Gabriel Rodríguez
    María J. Martín
    Patricia González
    Juan Touriño
    Ramón Doallo
    International Journal of Parallel Programming, 2013, 41 : 782 - 805
  • [47] Estimating checkpointing, rollback and recovery overheads
    Mandal, PS
    Mukhopadhyaya, K
    DISTRIBUTED COMPUTING: IWDC 2003, 2003, 2918 : 56 - 65
  • [48] Formalization and proof of correctness of the crash recovery algorithm for an open and safe nested transaction model
    Madria, SK
    Maheshwari, SN
    Chandra, B
    Bhargava, B
    INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2001, 10 (1-2) : 1 - 50
  • [49] A MODEL FOR ERROR RECOVERY WITH GLOBAL CHECKPOINTING
    KANT, K
    INFORMATION SCIENCES, 1983, 30 (03) : 225 - 239
  • [50] A GLOBAL CHECKPOINTING MODEL FOR ERROR RECOVERY
    KANT, K
    AFIPS CONFERENCE PROCEEDINGS, 1983, 52 : 81 - &