ReIPE: Recycling Idle PEs in CNN Accelerator for Vulnerable Filters Soft-Error Detection

被引:0
|
作者
Wei, Xiaohui [1 ]
Wang, Chenyang [1 ]
Yue, Hengshan [1 ]
Tan, Jingweijia [1 ]
Guan, Zeyu [2 ]
Jiang, Nan [1 ]
Zheng, Xinyang [1 ]
Zhao, Jianpeng [1 ]
Qiu, Meikang [3 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun, Jilin, Peoples R China
[2] Jilin Univ, Changchun, Peoples R China
[3] Dakota State Univ, Beacom Coll Comp & Cyber Sci, Madison, WI USA
基金
中国国家自然科学基金;
关键词
Soft error; systolic array; CNN; error resilience analysis;
D O I
10.1145/3674909
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
To satisfy prohibitively massive computational requirements of current deep Convolutional Neural Networks (CNNs), CNN-specific accelerators are widely deployed in large-scale systems. Caused by high-energy neutrons and alpha-particle strikes, soft error may lead to catastrophic failures when CNN is deployed on high integration density accelerators. As CNNs become ubiquitous in mission-critical domains, ensuring the reliable execution of CNN accelerators in the presence of soft errors is increasingly essential. In this article, we propose to Recycle Idle Processing Elements (PEs) in the CNN accelerator for vulnerable filters soft error detection (ReIPE). Considering the error-sensitivity of filters, ReIPE first carries out a filter-level gradient analysis process to replace fault injection for fast filter-wise error resilience estimation. Then, to achieve maximal reliability benefits, combining the hardware-level systolic array idleness and software-level CNN filter-wise error resilience profile, ReIPE preferentially duplicated loads the most vulnerable filters onto systolic array to recycle idle-column PEs for opportunistically redundant execution (error detection). Exploiting the data reuse properties of accelerators, ReIPE incorporates the error detection process into the original computation flow of accelerators to perform real-time error detection. Once the error is detected, ReIPE will trigger a correction round to rectify the erroneous output. Experimental results performed on LeNet-5, Cifar-10-CNN, AlexNet, ResNet-20, VGG-16, and ResNet-50 exhibit that ReIPE can cover 96.40% of errors while reducing 75.06% performance degradation and 67.79% energy consumption of baseline dual modular redundancy on average. Moreover, to satisfy the reliability requirements of various application scenarios, ReIPE is also applicable for pruned, quantized, and Transformer-based models, as well as portable to other accelerator architectures.
引用
收藏
页数:26
相关论文
共 22 条
  • [1] Soft-error detection using control flow assertions
    Goloubeva, O
    Rebaudengo, M
    Reorda, MS
    Violante, M
    18TH IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI SYSTEMS, PROCEEDINGS, 2003, : 581 - 588
  • [2] Fingerprinting: Bounding soft-error detection latency and bandwidth
    Smolens, JC
    Gold, BT
    Kim, J
    Falsafi, B
    Hoe, JC
    Nowatzyk, AG
    ACM SIGPLAN NOTICES, 2004, 39 (11) : 224 - 234
  • [3] On accelerating soft-error detection by targeted pattern generation
    Sanyal, Alodeep
    Ganeshpure, Kunal
    Kundu, Sandip
    ISQED 2007: PROCEEDINGS OF THE EIGHTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN, 2007, : 723 - +
  • [4] Direct Ionization Impact on Accelerator Mixed-Field Soft-Error Rate
    Alia, Ruben Garcia
    Tali, Maris
    Brugger, Markus
    Cecchetto, Matteo
    Cerutti, Francesco
    Cononetti, Andrea
    Danzeca, Salvatore
    Esposito, Luigi
    Fernandez-Martinez, Pablo
    Gilardoni, Simone
    Infantino, Angelo
    Kastriotou, Maria
    Kerboub, Nourdine
    Lerner, Giuseppe
    Wyrwoll, Vanessa
    Ferlet-Cavrois, Veronique
    Boatella, Cesar
    Javanainen, Arto
    Kettunen, Heikki
    Morilla, Yolanda
    Martin-Holgado, Pedro
    Gaillard, Remi
    Wrobel, Frederic
    Cazzaniga, Carlo
    Alexandrescu, Dan
    Glorieux, Maximilien
    Puchner, Helmut
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2020, 67 (01) : 345 - 352
  • [5] Low-Cost Soft-Error Compensation for Transposed FIR Digital Filters
    Paliouras, Vassilis
    Karagianni, Konstantina
    Oster, Yann
    2018 7TH INTERNATIONAL CONFERENCE ON MODERN CIRCUITS AND SYSTEMS TECHNOLOGIES (MOCAST), 2018,
  • [6] Generic Soft-Error Detection and Correction for Concurrent Data Structures
    Borchert, Christoph
    Schirmeier, Horst
    Spinczyk, Olaf
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2017, 14 (01) : 22 - 36
  • [7] Soft-Error Detection in Register Files using Circular Scan
    Schat, Jan
    2017 12TH IEEE INTERNATIONAL CONFERENCE ON DESIGN & TECHNOLOGY OF INTEGRATED SYSTEMS IN NANOSCALE ERA (DTIS 2017), 2017,
  • [8] An integrated approach for increasing the soft-error detection capabilities in SoCs processors
    Bernardi, P
    Bolzani, L
    Rebaudengo, M
    Reorda, MS
    Violante, M
    DFT 2005: 20TH IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI SYSTEMS, 2005, : 445 - 453
  • [9] Program-Invariant Checking for Soft-Error Detection using Reconfigurable Hardware
    Park, Joonseok
    Diniz, Pedro C.
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2015, 9 (01)
  • [10] Multiple-Bit-Flip Detection Scheme for A Soft-Error Resilient TCAM
    Syafalni, Infall
    Sasao, Tsutomu
    Wen, Xiaoqing
    2016 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI), 2016, : 679 - 684