A New Parallel Recomputing Code Design Methodology for Fault-Tolerant Parallel Algorithm

被引:0
|
作者
Du, Yunfei [1 ]
Peng, Lin [1 ]
Zhao, Kejia [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Changsha 410073, Hunan, Peoples R China
关键词
parallel recomputing code; template; slice; ROLLBACK-RECOVERY; SYSTEMS;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
As the size of large-scale computer systems increases, their mean-time-between-failures are becoming significantly shorter than the execution time of many current scientific applications. To complete the execution of scientific applications, they must tolerate hardware failures. Fault-tolerant Parallel Algorithm (FTPA) is an application-level fault-tolerant approach for large-scale scientific applications, and it can achieve fast self-recovery through parallel recomputing. In this paper, first we propose a new parallel recomputing code design methodology, and the parallel recomputing code designed by the methodology can achieve a high efficiency of parallel recomputing. Second, the parallel recomputing code design methodology is automated by exploring the use of compiler technology. Finally, we evaluate the performance of our approach with two kernels of NAS Parallel Benchmarks on a cluster system with 512 CPUs. The experimental results show that the parallel recomputing code generated by our approach has a higher efficiency of parallel recomputing than the code generated by loop parallelization.
引用
收藏
页码:220 / 226
页数:7
相关论文
共 50 条
  • [21] Design of nMOS fault-tolerant parallel processing array for image analysis
    Le, K.
    Nickolls, P.
    National Conference Publication - Institution of Engineers, Australia, 1989, (89 pt 10):
  • [22] A fault-tolerant parallel heuristic for assignment problems
    Talbi, EG
    Geib, JM
    Hafidi, Z
    Kebbal, D
    PARALLEL AND DISTRIBUTED PROCESSING, 1998, 1388 : 306 - 314
  • [23] A Communication Framework for Fault-Tolerant Parallel Execution
    Kanna, Nagarajan
    Subhlok, Jaspal
    Gabriel, Edgar
    Rohit, Eshwar
    Anderson, David
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2010, 5898 : 1 - +
  • [24] Fault-Tolerant Adaptive Parallel and Distributed Simulation
    D'Angelo, Gabriele
    Ferretti, Stefano
    Marzolla, Moreno
    Armaroli, Lorenzo
    2016 IEEE/ACM 20TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT), 2016, : 37 - 44
  • [25] Simulation of fault-tolerant parallel systems design paradigms for extensible solutions
    Campbell, C
    Fiorini, PM
    PDPTA'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-4, 2003, : 1359 - 1362
  • [26] Fault-Tolerant Parallel Execution of Workflows with Deadlines
    Eitschberger, Patrick
    Keller, Joerg
    2017 25TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2017), 2017, : 78 - 84
  • [27] SUPPORTING FAULT-TOLERANT PARALLEL PROGRAMMING IN LINDA
    BAKKEN, DE
    SCHLICHTING, RD
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1995, 6 (03) : 287 - 302
  • [28] A temporal model for fault-tolerant parallel programs
    Slimani, Y
    Majdoub, L
    PROCEEDINGS OF THE SIXTH IEEE COMPUTER SOCIETY WORKSHOP ON FUTURE TRENDS OF DISTRIBUTED COMPUTING SYSTEMS, 1997, : 304 - 309
  • [29] Hierarchical Hexagon: A New Fault-Tolerant Interconnection Network for Parallel Systems
    Tripathy, Laxminath
    Tripathy, Chita Ranjan
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2021, 21 (01) : 32 - 49
  • [30] A Parallel Route Assignment Algorithm for Fault-Tolerant Clos Networks in OTN Switches
    Wang, Lingkang
    Ye, Tong
    Lee, Tony T.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (05) : 977 - 989