REPAIR: Hard-Error Recovery via Re-Execution

被引:0
|
作者
Soman, Jyothish [1 ]
Miralaei, Negar [1 ]
Mycroft, Alan [1 ]
Jones, Timothy M. [1 ]
机构
[1] Univ Cambridge, Comp Lab, Cambridge CB2 1TN, England
基金
英国工程与自然科学研究理事会;
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Processor reliability at upcoming technology nodes presents significant challenges to designers from increased manufacturing variability, parametric variation and transistor wearout leading to permanent faults. We present a design to tolerate this impact at the microarchitectural level-a chip with n cores together with one or more shared instruction re-execution units (IRUs). Instructions using a faulty component are identified and re-executed on an IRU. This design incurs no slowdown in the absence of errors and allows continued operation of all n cores after multiple hard errors on one or all cores in the structures protected by our scheme. Experiments show that a single-core chip experiences only a 23% slowdown with 1 error, rising to 43% in the presence of 5 errors. In a 4-core scenario with 4 errors on every core and a shared IRU, REPAIR enables performance of 0.68x of a fully functioning system.
引用
收藏
页码:76 / 79
页数:4
相关论文
共 50 条
  • [41] Pay-As-You-Go: Low-Overhead Hard-Error Correction for Phase Change Memories
    Qureshi, Moinuddin K.
    PROCEEDINGS OF THE 2011 44TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 44), 2011, : 318 - 328
  • [43] A SURVEY ON SYNTACTIC ERROR RECOVERY AND REPAIR
    HAMMOND, K
    RAYWARDSMITH, VJ
    COMPUTER LANGUAGES, 1984, 9 (01): : 51 - 67
  • [44] Execution-driven simulation of error recovery techniques for multicomputers
    Frazier, TM
    Tamir, Y
    30TH ANNUAL SIMULATION SYMPOSIUM, PROCEEDINGS, 1997, : 4 - 13
  • [45] Automatic Error Recovery in Robot Assembly Operations Using Reverse Execution
    Laursen, Johan Sund
    Schultz, Ulrik Pagh
    Ellekilde, Lars-Peter
    2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2015, : 1785 - 1792
  • [46] Soft- and Hard-Error Radiation Reliability of 228 KB 3T+1C Oxide Semiconductor Memory
    Takahashi, H.
    Okamoto, Y.
    Hamada, T.
    Komura, Y.
    Watanabe, S.
    Tsuda, K.
    Sawai, H.
    Matsuzaki, T.
    Ando, Y.
    Onuki, T.
    Kunitake, H.
    Yamazaki, S.
    Kobayashi, D.
    Ikuta, A.
    Makino, T.
    Ohshima, T.
    2023 IEEE INTERNATIONAL RELIABILITY PHYSICS SYMPOSIUM, IRPS, 2023,
  • [47] Soft Error Detection via Double Execution with Hardware Assistance
    Bustamante, Luis
    Al-Asaad, Hussain
    2012 IEEE AUTOTESTCON PROCEEDINGS, 2012, : 291 - 293
  • [48] MODELING ERROR RECOVERY AND REPAIR IN AUTOMATIC SPEECH RECOGNITION
    BABER, C
    HONE, KS
    INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1993, 39 (03): : 495 - 515
  • [49] Pauli error estimation via Population Recovery
    Flammia, Steven T.
    O'Donnell, Ryan
    QUANTUM, 2021, 5
  • [50] BACKWARD ERROR RECOVERY VIA CONVERSATIONS IN ADA
    ROMANOVSKY, A
    STRIGINI, L
    SOFTWARE ENGINEERING JOURNAL, 1995, 10 (06): : 219 - 232