A New Parallel Recomputing Code Design Methodology for Fault-Tolerant Parallel Algorithm

被引:0
|
作者
Du, Yunfei [1 ]
Peng, Lin [1 ]
Zhao, Kejia [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Changsha 410073, Hunan, Peoples R China
关键词
parallel recomputing code; template; slice; ROLLBACK-RECOVERY; SYSTEMS;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
As the size of large-scale computer systems increases, their mean-time-between-failures are becoming significantly shorter than the execution time of many current scientific applications. To complete the execution of scientific applications, they must tolerate hardware failures. Fault-tolerant Parallel Algorithm (FTPA) is an application-level fault-tolerant approach for large-scale scientific applications, and it can achieve fast self-recovery through parallel recomputing. In this paper, first we propose a new parallel recomputing code design methodology, and the parallel recomputing code designed by the methodology can achieve a high efficiency of parallel recomputing. Second, the parallel recomputing code design methodology is automated by exploring the use of compiler technology. Finally, we evaluate the performance of our approach with two kernels of NAS Parallel Benchmarks on a cluster system with 512 CPUs. The experimental results show that the parallel recomputing code generated by our approach has a higher efficiency of parallel recomputing than the code generated by loop parallelization.
引用
收藏
页码:220 / 226
页数:7
相关论文
共 50 条
  • [41] A Markov model for fault-tolerant task parallel computations
    Bertolli, Carlo
    Meneghin, Massimiliano
    Gabarro, Joaquim
    FROM GRIDS TO SERVICE AND PERVASIVE COMPUTING, 2008, : 123 - +
  • [42] Fault-tolerant architecture for serial-parallel multipliers
    Abd El-Gawad, AO
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 1998, 11 (1-2) : 118 - 126
  • [43] Parallel algorithms for fault-tolerant mobile agent execution
    Yang, J
    Cao, JN
    Wu, WG
    Xu, CZ
    DISTRIBUTED AND PARALLEL COMPUTING, 2005, 3719 : 246 - 256
  • [44] An parallel diagnosis method for an optimal fault-tolerant network
    Suh, JK
    Kwon, HJ
    Rhee, CS
    1997 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, 1997, : 750 - 755
  • [45] A Methodology for the Design of Fault Tolerant Parallel Digital Channelizers on SRAM-FPGAs
    Gao, Zhen
    Xiao, Jiajun
    Liu, Qiang
    Ullah, Anees
    Reviriego, Pedro
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2023, 70 (05) : 2003 - 2015
  • [46] FAULT-TOLERANT PARALLEL K-SELECTION ALGORITHM IN N-CUBE NETWORKS
    SHEU, JP
    INFORMATION PROCESSING LETTERS, 1991, 39 (02) : 93 - 97
  • [47] A FAULT-TOLERANT OPTIMAL MESSAGE ROUTING METHODOLOGY FOR CUBE-CONNECTED-CYCLES PARALLEL COMPUTERS
    Jan, Gene Eu
    Li, Cheng-Hung
    Chen, Yung-Yuan
    Leu, Shao-Wei
    JOURNAL OF MARINE SCIENCE AND TECHNOLOGY-TAIWAN, 2013, 21 (05): : 605 - 610
  • [48] Fault-tolerant parallel six-component force sensor
    Jiantao Yao
    Hongyu Zhang
    Weimin Zhang
    Yundou Xu
    Yongsheng Zhao
    Meccanica, 2016, 51 : 1639 - 1651
  • [49] From massively parallel image processors to fault-tolerant nanocomputers
    Han, H
    Jonker, P
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, 2004, : 2 - 7
  • [50] A fault-tolerant architecture for parallel applications in tiled-CMPs
    Sanchez, Daniel
    Aragon, Juan L.
    Garcia, Jose M.
    JOURNAL OF SUPERCOMPUTING, 2012, 61 (03): : 997 - 1023