Reduced-precision Algorithm-based Fault Tolerance for FPGA-implemented Accelerators

被引:0
|
作者
Davis, James J. [1 ]
Cheung, Peter Y. K. [1 ]
机构
[1] Imperial Coll London, London SW7 2AZ, England
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1007/978-3-319-30481-6_31
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As the threat of fault susceptibility caused by mechanisms including variation and degradation increases, engineers must give growing consideration to error detection and correction. While the use of common fault tolerance strategies frequently causes the incursion of significant overheads in area, performance and/or power consumption, options exist that buck these trends. In particular, algorithm-based fault tolerance embodies a proven family of low-overhead error mitigation techniques able to be built upon to create self-verifying circuitry. In this paper, we present our research into the application of algorithm-based fault tolerance (ABFT) in FPGA-implemented accelerators at reduced levels of precision. This allows for the introduction of a previously unexplored tradeoff: sacrificing the observability of faults associated with low-magnitude errors for gains in area, performance and efficiency by reducing the bit-widths of logic used for error detection. We describe the implementation of a novel checksum truncation technique, analysing its effects upon overheads and allowed error. Our findings include that bit-width reduction of ABFT circuitry within a fault-tolerant accelerator used for multiplying pairs of 32 x 32 matrices resulted in the reduction of incurred area overhead by 16.7% and recovery of 8.27% of timing model frnax. These came at the cost of introducing average and maximum absolute output errors of 0.430% and 0.927%, respectively, of the maximum absolute output value under transient fault injection.
引用
收藏
页码:361 / 368
页数:8
相关论文
共 50 条
  • [41] Rethinking Algorithm-Based Fault Tolerance with a Cooperative Software-Hardware Approach
    Li, Dong
    Chen, Zizhong
    Wu, Panruo
    Vetter, Jeffrey S.
    2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,
  • [42] Towards Reliable AI Applications via Algorithm-Based Fault Tolerance on NVDLA
    Sanic, Mustafa Tarik
    Guo, Cong
    Leng, Jingwen
    Guo, Minyi
    Ma, Weiyin
    2022 18TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN, 2022, : 736 - 743
  • [43] Effects of Runtime Reconfiguration on PUFs Implemented as FPGA-Based Accelerators
    Nassar, Hassan
    Bauer, Lars
    Henkel, Joerg
    IEEE EMBEDDED SYSTEMS LETTERS, 2023, 15 (04) : 174 - 177
  • [44] Employment of Reduced Precision Redundancy for Fault Tolerant FPGA Applications
    Sullivan, Margaret A.
    Loomis, Herschel H.
    Ross, Alan A.
    PROCEEDINGS OF THE 2009 17TH IEEE SYMPOSIUM ON FIELD PROGRAMMABLE CUSTOM COMPUTING MACHINES, 2009, : 283 - 286
  • [45] GPU-ABFT: Optimizing Algorithm-Based Fault Tolerance for Heterogeneous Systems with GPUs
    Chen, Jieyang
    Li, Sihuan
    Chen, Zizhong
    2016 IEEE INTERNATIONAL CONFERENCE ON NETWORKING ARCHITECTURE AND STORAGE (NAS), 2016,
  • [46] Exploiting Redundant Computation in Communication-Avoiding Algorithms for Algorithm-Based Fault Tolerance
    Coti, Camille
    2016 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY), IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC), AND IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2016, : 214 - 219
  • [47] Mantissa-preserving operations and robust algorithm-based fault tolerance for matrix computations
    Dutt, S
    Assaad, FT
    IEEE TRANSACTIONS ON COMPUTERS, 1996, 45 (04) : 408 - 424
  • [48] Design of FPGA-Implemented Reed-Solomon Erasure Code (RS-EC) Decoders With Fault Detection and Location on User Memory
    Gao, Zhen
    Zhang, Lingling
    Cheng, Yinghao
    Guo, Kangkang
    Ullah, Anees
    Reviriego, Pedro
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2021, 29 (06) : 1073 - 1082
  • [49] Genetic Algorithm-based Electromagnetic Fault Injection
    Maldini, Antun
    Samwel, Niels
    Picek, Stjepan
    Batina, Lejla
    2018 WORKSHOP ON FAULT DIAGNOSIS AND TOLERANCE IN CRYPTOGRAPHY (FDTC), 2018, : 35 - 42
  • [50] Detection of soft errors in LU decomposition with partial pivoting using algorithm-based fault tolerance
    Yao, Erlin
    Zhang, Jiutian
    Chen, Mingyu
    Tan, Guangming
    Sun, Ninghui
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2015, 29 (04): : 422 - 436