Predicting and mitigating single-event upsets in DRAM using HOTH

被引:2
|
作者
Longofono, Stephen [1 ]
Kline, Donald, Jr. [1 ]
Melhem, Rami [2 ]
Jones, Alex K. [1 ]
机构
[1] Univ Pittsburgh, Dept Comp Sci, Pittsburgh, PA 15260 USA
[2] Univ Pittsburgh, Dept Elect & Comp Engn, Pittsburgh, PA 15260 USA
基金
美国国家科学基金会;
关键词
Radiation test; Memory reliability; Fault map; DRAM; MEMORY; TECHNOLOGY; ECC;
D O I
10.1016/j.microrel.2020.114024
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
There is a growing demand for using commodity memory and storage solutions to make commercial aerospace ventures economically feasible. Existing radiation-hardened computer systems cannot meet this need alone. These hardened systems provide sufficient protection against the harsh environment of the upper atmosphere and low-Earth orbit, but require dramatically increased cost and utilize commercially out of date architectures and fabrication technologies. If new aerospace systems can take advantage of the latest commodity memories, they can leverage relevant advanced fabrication processes and the economy of scale to control costs. Of course, such systems would require new strategies to maintain appropriate tolerance and/or resilience to faults from the harsh environment. In this work, we observe that single-event effects (SEEs) in recent generation DRAM memories are not entirely random, and in fact are often highly predictable under neutron radiation bombardment. We demonstrate the existence of a small number of weak cells responsible for the vast majority of single-bit, SEEs. Based on this observation, we present a memory fault mapping and tolerance approach called HOTH to mitigate these predictable fault modes in conjunction with more random/unpredictable SEEs in DDR3 memory. In HOTH, both single- and multi-bit effects can be mitigated individually at runtime using a combination of existing errorcorrecting code techniques in Chipkill ECC and a fault map framework. The HOTH fault map is stored in the same DRAM that is subject to SEEs and leverages a fault-tolerance approach to mitigate SEEs that might appear in that part of the storage. Using data from different memory DIMMs, form factors, and radiation incidence angles we show that with HOTH we can improve uncorrectable fault rate by at least ten orders of magnitude and increase mean-time-to-failure to thousands of years, allowing extended service times in harsh environments.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Impact of Single-Event Upsets in Deep-Submicron Silicon Technology
    Robert Baumann
    MRS Bulletin, 2003, 28 : 117 - 120
  • [32] Relations between basic nuclear data and single-event upsets phenomena
    Blomgren, J
    Granbom, B
    Granlund, T
    Olsson, N
    MRS BULLETIN, 2003, 28 (02) : 121 - 125
  • [33] Investigation of Single-Event Upsets in Radiation Hardened RRAM Memory Cells
    Cirakoglu, Ahmet
    Serb, Alex
    Zwolinski, Mark
    Prodromakis, Themis
    2024 22ND IEEE INTERREGIONAL NEWCAS CONFERENCE, NEWCAS 2024, 2024, : 158 - 162
  • [34] The Influence of Ion Track Characteristics on Single-Event Upsets and Multiple-Cell Upsets in Nanometer SRAM
    Luo, Yinhong
    Zhang, Fengqi
    Chen, Wei
    Ding, Lili
    Wang, Tan
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2021, 68 (05) : 1111 - 1119
  • [35] PREDICTING SINGLE EVENT UPSETS IN THE EARTHS PROTON BELTS
    BENDEL, WL
    PETERSEN, EL
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 1984, 31 (06) : 1201 - 1206
  • [36] Single-event upsets in the Cluster and Double Star Digital Wave Processor instruments
    Yearby, K. H.
    Balikhin, M.
    Walker, S. N.
    SPACE WEATHER-THE INTERNATIONAL JOURNAL OF RESEARCH AND APPLICATIONS, 2014, 12 (01): : 24 - 28
  • [37] NOISE IMPACT OF SINGLE-EVENT UPSETS ON AN FPGA-BASED DIGITAL FILTER
    Pratt, Brian H.
    Wirthlin, Michael J.
    Caffrey, Michael
    Graham, Paul
    Morgan, Keith
    FPL: 2009 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, 2009, : 38 - +
  • [38] Impact of Single-Event Upsets on Convolutional Neural Networks in Xilinx Zynq FPGAs
    Wang, H. -B.
    Wang, Y. -S.
    Xiao, J. -H.
    Wang, S. -L.
    Liang, T. -J.
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2021, 68 (04) : 394 - 401
  • [39] Single-Event Upsets in Photoreceivers for Multi-Gb/s Data Transmission
    Pacheco, Alberto Jimenez
    Troska, Jan
    Amaral, Luis
    Dris, Stefanos
    Ricci, Daniel
    Sigaud, Christophe
    Vasey, Francois
    Vichoudis, Paschalis
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2009, 56 (04) : 1978 - 1986
  • [40] Analysis and Optimization of Sequential Circuit Elements to Combat Single-Event Timing Upsets
    Abrishami, Hamed
    Hatami, Safar
    Pedram, Massoud
    2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, 2010, : 985 - 988