Improving Fault Tolerance for FPGA SoCs through Post-Radiation Design Analysis

被引:0
|
作者
Wilson, Andrew Elbert [1 ]
Baker, Nathan [1 ]
Campbell, Ethan [1 ]
Wirthlin, Michael [1 ]
机构
[1] Brigham Young Univ, Provo, UT 84602 USA
关键词
FPGA; TMR; RISC-V; soft processor; radiation testing; fault injection; fault analysis; reliability; BENCHMARKS; PROCESSOR; TMR;
D O I
10.1145/3674841
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
FPGAs have been shown to operate reliably within harsh radiation environments by employing single- event upset (SEU) mitigation techniques, such as configuration scrubbing, triple-modular redundancy, error correction coding, and radiation aware implementation techniques. The effectiveness of these techniques, however, is limited when using complex system-level designs that employ complex I/O interfaces with single- point failures. In previous work, a complex SoC system running Linux applied several of these techniques only to obtain an improvement of 14x x in mean time to failure (MTTF). A detailed post-radiation fault analysis found that the limitations in reliability were due to the DDR interface, the global clock network, and interconnect. This article applied a number of design-specific SEU mitigation techniques to address the limitations in reliability of this design. These changes include triplicating the global clock, optimizing the placement of the reduction output voters and input flip-flops, and employing a mapping technique called "striping." The application of these techniques improved MTTF of the mitigated design by a factor of 1.54x x and thus provides a 22.8Xx x MTTF improvement over the unmitigated design. A post-radiation fault analysis using BFAT was also performed to find the remaining design vulnerabilities.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] Post-Radiation Radiochemical Analysis of Spent Nuclear Fuel from VVER-440 Reactor
    V. N. Momotov
    E. A. Erin
    A. Yu. Volkov
    V. N. Kupriyanov
    Radiochemistry, 2021, 63 : 197 - 208
  • [32] Improved Fault-tolerance through Dynamic Modular Redundancy (DMR) on the RISA FPGA Platform
    Trefzer, Martin A.
    Tyrrell, Andy M.
    2014 NASA/ESA CONFERENCE ON ADAPTIVE HARDWARE AND SYSTEMS (AHS), 2014, : 39 - 46
  • [33] CT-derived vessel segmentation for analysis of post-radiation therapy changes in vasculature and perfusion
    Wuschner, Antonia E. E.
    Flakus, Mattison J. J.
    Wallat, Eric M. M.
    Reinhardt, Joseph M. M.
    Shanmuganayagam, Dhanansayan
    Christensen, Gary E.
    Gerard, Sarah E. E.
    Bayouth, John E. E.
    FRONTIERS IN PHYSIOLOGY, 2022, 13
  • [34] Post-Radiation Radiochemical Analysis of Spent Nuclear Fuel from VVER-440 Reactor
    Momotov, V. N.
    Erin, E. A.
    Volkov, A. Yu
    Kupriyanov, V. N.
    RADIOCHEMISTRY, 2021, 63 (02) : 197 - 208
  • [35] Reliability Evaluation and Fault Tolerance Design for FPGA Implemented Reed Solomon (RS) Erasure Decoders
    Gao, Zhen
    Shi, Jinchang
    Liu, Qiang
    Ullah, Anees
    Reviriego, Pedro
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2023, 31 (01) : 142 - 146
  • [36] Exploring Quantitative MRI Biomarkers of Head and Neck Post-Radiation Lymphedema and Fibrosis: Post Hoc Analysis of a Prospective Trial
    Mao, Shitong
    MD Anderson Head and Neck Canc Symptom Working Grp, Jihong
    Wang, Jihong
    McMillan, Holly
    Mohamed, Abdallah Sherif Radwan
    Buoy, Sheila
    Ahmed, Sara
    Mulder, Samuel L.
    Naser, Mohamed A.
    He, Renjie
    Wahid, Kareem A.
    Chen, Melissa Mei
    Ding, Yao
    Moreno, Amy C.
    Lai, Stephen Y.
    Fuller, Clifton David
    Hutcheson, Katherine Arnold
    HEAD AND NECK-JOURNAL FOR THE SCIENCES AND SPECIALTIES OF THE HEAD AND NECK, 2025,
  • [37] Improving the FPGA design process through determining and applying logical-to-physical design mappings
    Graham, P
    Hutchings, B
    Nelson, B
    2000 IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2000, : 305 - 306
  • [38] IMPROVING AUTONOMOUS SOFT-ERROR TOLERANCE OF FPGA THROUGH LUT CONFIGURATION BIT MANIPULATION
    Das, Anup
    Venkataraman, Shyamsundar
    Kumar, Akash
    2013 23RD INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2013) PROCEEDINGS, 2013,
  • [39] Design time reliability analysis of distributed fault tolerance algorithms
    Latronico, E
    Koopman, P
    2005 INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2005, : 486 - 495
  • [40] Improving Fault Tolerance in Model Predictive Control through Enlargement of the Recursively Feasible Set
    Costa, Daniella E. S.
    Galvao, Roberto K. H.
    de Almeida, Fabio A.
    Afonso, Rubens J. M.
    2015 EUROPEAN CONTROL CONFERENCE (ECC), 2015, : 3059 - 3065