Predicting and mitigating single-event upsets in DRAM using HOTH

被引:2
|
作者
Longofono, Stephen [1 ]
Kline, Donald, Jr. [1 ]
Melhem, Rami [2 ]
Jones, Alex K. [1 ]
机构
[1] Univ Pittsburgh, Dept Comp Sci, Pittsburgh, PA 15260 USA
[2] Univ Pittsburgh, Dept Elect & Comp Engn, Pittsburgh, PA 15260 USA
基金
美国国家科学基金会;
关键词
Radiation test; Memory reliability; Fault map; DRAM; MEMORY; TECHNOLOGY; ECC;
D O I
10.1016/j.microrel.2020.114024
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
There is a growing demand for using commodity memory and storage solutions to make commercial aerospace ventures economically feasible. Existing radiation-hardened computer systems cannot meet this need alone. These hardened systems provide sufficient protection against the harsh environment of the upper atmosphere and low-Earth orbit, but require dramatically increased cost and utilize commercially out of date architectures and fabrication technologies. If new aerospace systems can take advantage of the latest commodity memories, they can leverage relevant advanced fabrication processes and the economy of scale to control costs. Of course, such systems would require new strategies to maintain appropriate tolerance and/or resilience to faults from the harsh environment. In this work, we observe that single-event effects (SEEs) in recent generation DRAM memories are not entirely random, and in fact are often highly predictable under neutron radiation bombardment. We demonstrate the existence of a small number of weak cells responsible for the vast majority of single-bit, SEEs. Based on this observation, we present a memory fault mapping and tolerance approach called HOTH to mitigate these predictable fault modes in conjunction with more random/unpredictable SEEs in DDR3 memory. In HOTH, both single- and multi-bit effects can be mitigated individually at runtime using a combination of existing errorcorrecting code techniques in Chipkill ECC and a fault map framework. The HOTH fault map is stored in the same DRAM that is subject to SEEs and leverages a fault-tolerance approach to mitigate SEEs that might appear in that part of the storage. Using data from different memory DIMMs, form factors, and radiation incidence angles we show that with HOTH we can improve uncorrectable fault rate by at least ten orders of magnitude and increase mean-time-to-failure to thousands of years, allowing extended service times in harsh environments.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Mitigating Single-Event Upsets in COTS SDRAM using an EDAC SDRAM Controller
    Kyriakakis, Eleftherios
    Ngo, Kalle
    Oberg, Johnny
    2017 IEEE NORDIC CIRCUITS AND SYSTEMS CONFERENCE (NORCAS): NORCHIP AND INTERNATIONAL SYMPOSIUM OF SYSTEM-ON-CHIP (SOC), 2017,
  • [2] GUIDELINES FOR PREDICTING SINGLE-EVENT UPSETS IN NEUTRON ENVIRONMENTS
    LETAW, JR
    NORMAND, E
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 1991, 38 (06) : 1500 - 1506
  • [3] Single-Event Upsets in Microelectronics
    Henry H. K. Tang
    Nils Olsson
    MRS Bulletin, 2003, 28 : 107 - 110
  • [4] Estimating the Effect of Single-event Upsets on Microprocessors
    Constantinescu, Cristian
    Krishnamoorthy, Srini
    Nguyen, Tuyen
    PROCEEDINGS OF THE 2014 IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI AND NANOTECHNOLOGY SYSTEMS (DFTS), 2014, : 185 - 190
  • [5] NEUTRON GENERATED SINGLE-EVENT UPSETS IN THE ATMOSPHERE
    SILBERBERG, R
    TSAO, CH
    LETAW, JR
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 1984, 31 (06) : 1183 - 1185
  • [6] Design optimization for robustness to single-event upsets
    Zhou, Quming
    Choudhury, Mihir R.
    Mohanram, Kartik
    24TH IEEE VLSI TEST SYMPOSIUM, PROCEEDINGS, 2006, : 202 - +
  • [7] NUCLEAR MICROPROBE IMAGING OF SINGLE-EVENT UPSETS
    HORN, KM
    DOYLE, BL
    SEXTON, FW
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 1992, 39 (01) : 7 - 12
  • [8] Mitigating multiple single-event upsets during deep neural network inference using fault-aware training
    Vinck, Toon
    Jonckers, Nain
    Dekkers, Gert
    Prinzie, Jeffrey
    Karsmakers, Peter
    JOURNAL OF INSTRUMENTATION, 2025, 20 (02):
  • [9] SINGLE-EVENT UPSETS IN SPACECRAFT DIGITAL-SYSTEMS
    LEWKOWICZ, PE
    RICHTER, LJ
    ISA TRANSACTIONS, 1985, 24 (04) : 45 - 48
  • [10] Single-event upsets in microelectronics: Fundamental physics and issues
    Tang, HHK
    Rodbell, KP
    MRS BULLETIN, 2003, 28 (02) : 111 - 116