A different re-execution speed can help

被引:3
|
作者
Benoit, Anne [1 ,2 ]
Cavelan, Aurelien [1 ,2 ]
Le Fevre, Valentin [1 ,2 ]
Robert, Yves [1 ,2 ,3 ]
Sun, Hongyang [1 ,2 ]
机构
[1] Ecole Normale Super Lyon, Lyon, France
[2] Inria, Rennes, France
[3] Univ Tennessee, Knoxville, TN 37996 USA
关键词
resilience; silent errors; speeds; re-execution; checkpointing; verification;
D O I
10.1109/ICPPW.2016.45
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We consider divisible load scientific applications executing on large-scale platforms subject to silent errors. While the goal is usually to complete the execution as fast as possible in expectation, another major concern is energy consumption. The use of dynamic voltage and frequency scaling (DVFS) can help save energy, but at the price of performance degradation. Consider the execution model where a set of K different speeds is given, and whenever a failure occurs, a different re-execution speed may be used. Can this help? We address the following bi-criteria problem: how to compute the optimal checkpointing period to minimize energy consumption while bounding the degradation in performance. We solve this bi-criteria problem by providing a closed-form solution for the checkpointing period, and demonstrate via a comprehensive set of simulations that a different re-execution speed can indeed help.
引用
收藏
页码:250 / 257
页数:8
相关论文
共 50 条
  • [21] Using instruction result locality and re-execution to mitigate silent data corruptions
    Tajary, Alireza
    Zarandi, Hamid R.
    MICROELECTRONICS RELIABILITY, 2016, 62 : 178 - 190
  • [22] Store Vulnerability Window (SVW): Re-execution filtering for enhanced load optimization
    Roth, A
    32ND INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS, 2005, : 458 - 468
  • [23] HTFabric: A Fast Re-ordering and Parallel Re-execution Method for a High-Throughput Blockchain
    Song, Jaeyub
    Jeong, Juyeong
    Lee, Jemin
    Na, Inju
    Kim, Min-Soo
    PROCEEDINGS OF THE 33RD ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2024, 2024, : 2118 - 2127
  • [24] Task-Level Re-Execution Framework for Improving Fault Tolerance on Symmetry Multiprocessors
    Baek, Hyeongboo
    Lee, Jaewoo
    SYMMETRY-BASEL, 2019, 11 (05):
  • [25] Model-based Performance Analysis of Local Re-execution Scheme in Offloading System
    Wang, Qiushi
    Wu, Huaming
    Wolter, Katinka
    2013 43RD ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN), 2013,
  • [26] A characterization of re-execution costs for real-time abort-oriented protocols
    Shu, LC
    FIFTH INTERNATIONAL CONFERENCE ON REAL-TIME COMPUTING SYSTEMS AND APPLICATIONS, PROCEEDINGS, 1998, : 286 - 292
  • [27] Impact of MapReduce Task Re-execution Policy on Job Completion Reliability and Job Completion Time
    Lin, Jia-Chun
    Leu, Fang-Yie
    Chen, Ying-ping
    Munawar, Waqaas
    2014 IEEE 28TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2014, : 712 - 718
  • [28] ReSlice: Selective re-execution of long-retired misspeculated instructions using forward slicing
    Sarangi, SR
    Liu, W
    Torrellas, J
    Zhou, YY
    MICRO-38: Proceedings of the 38th Annual IEEE/ACM International Symposiumn on Microarchitecture, 2005, : 257 - 268
  • [29] THERMAL BENZOXAZINONE-BENZOXAZOLE CONVERSION, A RE-EXECUTION OF A MASS-SPECTROMETRIC DECAY BY THERMOLYSIS
    REICHEN, W
    HELVETICA CHIMICA ACTA, 1977, 60 (01) : 186 - 190
  • [30] Impact of Selective Implementation on Soft Error Detection Through Low-level Re-execution
    Nikscresht, Mohaddaseh
    De Blaere, Brent
    Vankeirsbilck, Jens
    Pissoort, Davy
    Boydens, Jeroen
    2021 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS DASC/PICOM/CBDCOM/CYBERSCITECH 2021, 2021, : 112 - 117