A New Parallel Recomputing Code Design Methodology for Fault-Tolerant Parallel Algorithm

被引:0
|
作者
Du, Yunfei [1 ]
Peng, Lin [1 ]
Zhao, Kejia [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Changsha 410073, Hunan, Peoples R China
关键词
parallel recomputing code; template; slice; ROLLBACK-RECOVERY; SYSTEMS;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
As the size of large-scale computer systems increases, their mean-time-between-failures are becoming significantly shorter than the execution time of many current scientific applications. To complete the execution of scientific applications, they must tolerate hardware failures. Fault-tolerant Parallel Algorithm (FTPA) is an application-level fault-tolerant approach for large-scale scientific applications, and it can achieve fast self-recovery through parallel recomputing. In this paper, first we propose a new parallel recomputing code design methodology, and the parallel recomputing code designed by the methodology can achieve a high efficiency of parallel recomputing. Second, the parallel recomputing code design methodology is automated by exploring the use of compiler technology. Finally, we evaluate the performance of our approach with two kernels of NAS Parallel Benchmarks on a cluster system with 512 CPUs. The experimental results show that the parallel recomputing code generated by our approach has a higher efficiency of parallel recomputing than the code generated by loop parallelization.
引用
收藏
页码:220 / 226
页数:7
相关论文
共 50 条
  • [1] FTPA: Supporting Fault-Tolerant Parallel Computing through Parallel Recomputing
    Yang, Xuejun
    Du, Yunfei
    Wang, Panfeng
    Fu, Hongyi
    Jia, Jia
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2009, 20 (10) : 1471 - 1486
  • [2] A new parallel recomputing code design methodology for fast failure recovery
    Du, Yunfei
    Tang, Yuhua
    Xie, Xinwei
    COMPUTERS & ELECTRICAL ENGINEERING, 2013, 39 (04) : 1095 - 1113
  • [3] A novel fault-tolerant parallel algorithm
    Wang, Panfeng
    Du, Yunfei
    Fu, Hongyi
    Zhou, Haifang
    Yang, Xuejun
    Yang, Wenjing
    ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2007, 4847 : 18 - 29
  • [4] Classification and design of fault-tolerant parallel
    Du, Yunfei
    Tang, Yuhua
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2011, 39 (04): : 49 - 52
  • [5] A fault-tolerant computing method for Xdraw parallel algorithm
    Wanfeng Dou
    Yanan Li
    The Journal of Supercomputing, 2018, 74 : 2776 - 2800
  • [6] A fault-tolerant computing method for Xdraw parallel algorithm
    Dou, Wanfeng
    Li, Yanan
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (06): : 2776 - 2800
  • [7] FAULT-TOLERANT PARALLEL PROCESSOR
    HARPER, RE
    LALA, JH
    JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 1991, 14 (03) : 554 - 563
  • [8] Efficient Fault-Tolerant Design for Parallel Matched Filters
    Gao, Zhen
    Zhou, Ming
    Reviriego, Pedro
    Antonio Maestro, Juan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2018, 65 (03) : 366 - 370
  • [9] A new fault-tolerant interconnection topology for parallel systems
    Tripathy, C.R.
    Dash, R.K.
    Journal of the Institution of Engineers (India), Part CP: Computer Engineering Division, 2008, 89 (MAY): : 8 - 13
  • [10] PAV: Parallel Average Voting Algorithm for Fault-Tolerant Systems
    Karimi, Abbas
    Zarafshan, Faraneh
    Jantan, Adznan B.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2011, 2 (01) : 38 - 41